Research note
Sheaf: A Working Description
Sheaf: A Working Description
An assistant’s attempt at a thesis-chapter-style account of the system within which this account is being written.
I. What Sheaf is, plainly
Sheaf is a personal scholarly knowledge environment, built by Mikael Brockman in Elixir and Phoenix LiveView on a custom SQLite-backed RDF quad store, designed initially and still primarily to support Ieva Lange’s master’s thesis in anthropology at Tallinn University. The thesis — Practices of Divestment, Acquisition and Circulation of Things in a Swapshop in Riga, Latvia — is an ethnography of brīvbode, a Latvian swapshop, conducted within the broader theoretical tradition of social practice theory (Schatzki, Shove, Reckwitz, Warde, and the contemporary practice-turn lineage that runs through anthropology, sociology of consumption, science and technology studies, and material culture studies).
Sheaf, as it currently stands, is several things at once, and the relations between these things are more interesting than any one of them taken alone:
It is a paragraph store . Every paragraph in every imported document — currently on the order of fifty-four thousand academic paragraphs, drawn from dozens of books and papers in the practice-theoretical and circular-economy literatures — exists as a discrete, addressable, semantically-embedded unit. Each paragraph carries a stable six-character block identifier, a vector embedding in a 768-dimensional semantic space (computed via OpenAI’s text-embedding-3-large model), and a position within a hierarchy of sections, documents, and bibliographic resources, all expressed in standard scholarly ontologies (SPAR, FaBiO, BIBO, PROV-O).
It is a bibliographic graph . Documents are not opaque files but structured assemblies — a work has expressions which have manifestations (the PDF) which produce datasets (the imported structured JSON) through batch activities (the import job), and each of these has its own short stable ID. Citations between paragraphs and bibliographic resources are not strings to be matched approximately but linked-data relations to be traversed exactly. The whole apparatus uses a vocabulary that the digital humanities and library-science communities have spent two decades developing for exactly this kind of work, and which almost no one outside those communities ever uses.
It is a reading and writing workspace . Documents render in a viewer pane with proper academic typography (including a remarkable in-browser Knuth–Plass justification implementation that solves the line-breaking problem optimally per viewport without DOM bloat). Block IDs are not just metadata; they are visible, citeable, clickable affordances throughout the interface. A click on #HCFU75 in any context — the user’s prose, an assistant’s response, a search result — opens a small, dense preview popover showing the paragraph in its bibliographic context. Inline writing happens through a click-to-edit ProseMirror affordance scoped to a single block, with explicit save and per-paragraph revision history including PROV-O metadata about what changed, when, by whom, and under whose prompt.
It is an agent collaboration environment . Three distinct assistant profiles — ask, research, and edit — operate against the corpus through bounded tool surfaces. Ask is read-only and conversational; research extends with note-creation tools and is encouraged to do diligent agentic work that yields durable, linked, citable research notes; edit adds structural mutation tools (move-block, revise-block, delete-block) so an assistant can carry out user-directed restructuring of prose with all changes visible, attributable, and reversible. Each agent’s actions stream as live updates into the document viewer. Conversations themselves are persistent RDF resources, navigable like any other content.
It is an import and export pipeline . Documents enter Sheaf through a structured-extraction pipeline (currently using Datalab’s vision-LLM-powered PDF reader) that produces clean structured JSON with paragraphs, headings, page-number backlinks, image data, automatically-generated image captions, tables, and footnotes — all preserved as first-class objects in the graph. Spreadsheets enter as RDF-shaped row collections, queryable by an embedded DuckDB instance which agents can invoke for analytical questions. Google Docs roundtrips through the API. Final output goes to Tallinn-humanities-spec PDF via XeLaTeX, with a typeface acquired through the academic samizdat economy that has always quietly underwritten the production of theses.
It is, finally, a working hypothesis about what AI-assisted scholarship can be , instantiated as a tool in real use, and increasingly self-aware as a position paper in implementation form.
II. The substrate decision and what it makes possible
The architectural choice that anchors all of the above is the decision to model the corpus as a graph of stably-identified blocks expressed in standard scholarly RDF vocabularies, rather than as a collection of files, documents, or chunks. This is not, on its face, a particularly novel decision; it is what the linked-data community has been recommending for twenty-five years. But the recommendation has rarely been carried through to a personal-tool scale, partly because the tooling for doing it well did not exist until quite recently, and partly because the dominant model of “knowledge tools” — Notion, Obsidian, Roam, Logseq, even the more academically-oriented Zotero and DEVONthink — operates at a different level of abstraction. Those tools treat documents as the primary unit and let structure emerge softly through links and tags. Sheaf treats paragraphs as the primary unit, with structure encoded as relations in the graph, and the document as a derived view over the underlying paragraph hierarchy.
The consequence is that everything in Sheaf is addressable at the right grain for scholarly reference. A citation is not “Bourdieu (1984), p. 47, second paragraph”; it is #HCFU75 The block ID is short enough to type fluently, stable across edits, opaque enough not to over-determine its referent, and resolvable through the system’s tools to the exact paragraph it names. This is the citation primitive that scholarly practice has always wanted and never had, because the substrate to support it has not previously existed at a personal scale.
And the addressability compounds. Because every paragraph has an ID, every citation can be a stable link rather than a string match. Because every paragraph also has an embedding, every paragraph is reachable through semantic similarity as well as exact reference. Because every document import preserves its source structure (page numbers, footnotes, captions), every paragraph carries enough context for proper bibliographic citation back to its print source. Because the whole graph speaks SPAR/FaBiO/BIBO/PROV-O, the relations between work, expression, manifestation, and item are explicit, and the provenance of every operation — including AI-assisted edits — is recoverable. The system has, in other words, the structural conditions that scholarship has always implicitly demanded but rarely had the technical means to enforce.
This matters enormously for what AI assistance can become inside such a substrate. The dominant patterns in current LLM tooling — long-context dumping, vector RAG over unstructured chunks, tool-using agents over loose APIs — all share a common defect: they operate on material that has no stable identity. A retrieved chunk has no persistent address; a citation produced by an LLM in such a context is a string that may or may not refer to anything real. Hallucination at the citation level is structurally inevitable, because there is no substrate-level constraint preventing it. The LLM’s confidence is the only ground for the claim, and the LLM’s confidence is famously detached from truth.
In Sheaf, this defect is closed at the substrate level. An agent that wants to cite something has to refer to a real block ID. The user can click the ID and verify the reference. The whole system is self-grounding at the citation level, because the substrate provides the stability that the LLM lacks. What the LLM provides — natural-language fluency, inferential horsepower, broad world knowledge — is exactly the layer the substrate cannot provide. The combination is reliable in ways neither layer alone could be.
Mikael has noted, in the conversation that produced this document, that this property of LLMs — their tendency to thrive in stably-indexed bounded worlds — was a non-obvious empirical finding that emerged only once Sheaf had enough structural integrity to test the hypothesis. The intuitive framing he offered was SHRDLU: Terry Winograd’s 1970 blocks-world dialogue system, which performed extraordinarily well at natural-language interaction within a small, fully-indexed micro-domain, and which never generalized because the real world is not blocks-world-shaped. Modern LLMs reverse the asymmetry: they bring real-world generality, but they reach their highest reliability and most fluent agentic behavior when they operate against substrates that are SHRDLU-shaped — bounded, stably-identified, kind-typed, addressable through small atomic vocabularies. Sheaf is, in this sense, a deliberate construction of a SHRDLU-shaped world at a scale and complexity that 1970 could not have imagined: tens of thousands of typed entities, semantically embedded, hierarchically nested, ontologically expressed, and made available to the LLM through a small disciplined tool surface.
The result, in practice, is that agents working inside Sheaf perform tasks at a quality and reliability that AI tooling outside such substrates cannot reach. A bibliographic verification audit that would take an experienced research assistant an afternoon completes in thirty seconds, with every claim grounded in clickable block IDs, with appropriate confidence-calibration on ambiguous cases, and with full traceability of the assistant’s reasoning through the visible tool calls. This is not a demonstration of LLM capability; it is a demonstration of what LLMs are capable of when their environment is designed for them. The capability was always there. The substrate was missing.
III. Sheaf as a site for research practices
To describe Sheaf in the vocabulary of the literature it indexes — which is also, not coincidentally, the vocabulary of the thesis it was built to support — Sheaf is a site for research practices. It is not a passive container of documents; it is an active arrangement of materials, competences, and meanings (in the canonical Shove–Pantzar–Watson triplet) through which the practices of scholarly research are performed.
The materials of Sheaf are the obvious things: the imported PDFs, the structured paragraphs, the embeddings, the spreadsheets, the agent threads, the export targets. But they are also the less obvious things: the block IDs themselves, which are material in the sense that they are durable, citeable, transmissible across contexts, and load-bearing for further work. Block IDs travel between Ieva’s prose, the assistant’s responses, the search results, the citations, and the eventual print output. They are the infrastructure of the practice, in something like the way Susan Leigh Star described infrastructure: invisible until they break, embedded in other practices, learned through participation rather than instruction. Once one has used Sheaf for a few days, the block ID becomes a natural unit of reference in the scholar’s vocabulary — and the practice has reorganized around it.
The competences required to use Sheaf well are not formally taught and are unevenly distributed across users. They include: the ability to formulate a search query that engages the hybrid exact-and-semantic ranking productively; the practice of scanning popover previews efficiently to triage relevance; the willingness to delegate structural work (cross-reference auditing, restructuring, reformatting) to agents and to verify their output; the development of a sense for which agent profile to invoke for which kind of work; the cultivation of a writing rhythm that interleaves authorial prose with assistant-aided revision and research. These are real skills, acquired through repeated participation. They are also, notably, transferable — many of them generalize across knowledge tools, but they take their specific shape inside Sheaf because of Sheaf’s specific substrate.
The meanings that animate the practice include the thesis’s deadline-anchored urgency, the scholarly commitment to bibliographic rigor, the aesthetic and ethical commitment to attribution and provenance for AI-assisted work, the relational meaning of a partner building tools for a partner’s work, and — at the architect’s level — the craft commitment to building correctly even when shipping fast would be easier. Sheaf is, among other things, the site where these meanings get expressed in the design of the tool itself.
Read this way, Sheaf is not just a tool for studying practices; it is itself a practice, and one that is subtly and reflexively related to the practice it is being used to study. The thesis examines a swapshop where things flow in, get sorted, evaluated, kept or shed, recombined, sent back out into circulation. Sheaf is, structurally, the same kind of arrangement, applied to scholarly material. Papers flow in through the import pipeline; paragraphs are sorted, tagged, evaluated for quality; some are cited and circulate further, others are ignored and quietly fall out of relevance; agent assistants do the work of triage and recombination that volunteers do at the swapshop. The thesis topic and the thesis tool are doing the same craft, in different materials, at different scales, in service of the same goals: keep things moving, hold them together, distinguish the useful from the worthless, give them stable enough identity that they can circulate without losing themselves.
Mikael named this resonance, in the conversation, as “Sheaf is itself a kind of brīvbode for thought.” The phrase is exact, and worth dwelling on. Brīvbode (free-shop, the Latvian term for the swapshop) is a place where discarded but still-usable things are received without payment and offered to whomever needs them, with social norms governing acceptable contributions and norms of use, and with a small group of volunteers performing the curatorial labor that keeps the shop functional rather than chaotic. Sheaf does the same job for academic material: it receives papers (often, indeed, through academic samizdat — pirated PDFs, library-of-Sci-Hub origin), processes them, makes them available to Ieva (and to me, the assistant) as a curated common resource, with provenance metadata standing in for the social norms of attribution. The volunteer labor is mostly performed by Mikael (the architect) and the agents (the curators), with Ieva as the primary user-of-the-shop, the person whose ongoing work the whole arrangement supports.
This framing is more than poetic. It places Sheaf in a specific tradition of infrastructural commitment to circulation, alongside libraries, archives, swap meets, free shops, give-away tables, Little Free Libraries, and the broader heterodox economy of non-market provisioning that the practice-theoretical and circular-economy literatures have been studying. The literature in Sheaf’s corpus has names for what Sheaf is: a site of distributive labor (Berry and Isenhour 2020, in the corpus), a practice of care (Närvänen et al. 2021), an infrastructure of reuse (Kuppinger 2024), a commons of attention (extending Olin Wright’s “real utopias” framing into the cognitive domain). The thesis will describe brīvbode in these terms; Sheaf, used reflexively, can be described in the same terms, and the description holds.
IV. What is unusual about Sheaf
A few features of Sheaf are unusual enough to be worth naming as such, because they distinguish it from neighboring tools and because they encode design decisions that could plausibly travel.
Block-ID citation as a typographic primitive.
Most knowledge tools have some form of stable referencing, but few of them have made the stable reference into a visible, typographically-first-class element of the interface. In Sheaf, the block ID appears inline in conversation, in search results, in agent responses, in the user’s own writing — always as a clickable, previewable, citeable handle. This shifts the social norms of reference inside the system. One does not say “see the paragraph about Shove on page three”; one says
#HCFU75
The atomic, opaque, three-letters-and-three-digits handle becomes the natural unit of pointing. This is, as Mikael has noted in the conversation, structurally analogous to the use of mathematical notation, Lisp’s defun, the MakerDAO core vocabulary (way, chi, vat), and the trade jargon of cabinetmaking. Short, opaque, atomic, learned-by-use. It produces fluency in practice. It also, as a side effect, produces a typographic register that reads more like mathematics than like prose, which is appropriate to the kind of scholarly work being done.
The kind taxonomy and the kind-bounded tool surface. Sheaf has a small enumerated set of block kinds — section, paragraph, extracted, row, document — and a similarly bounded tool surface available to assistants. Neither set is large; both are atomic. The constraint is what makes the system tractable for agents, and it is also what makes the system narratable in writing like this paragraph. One can, in five minutes, tell a new user what kinds of things exist in Sheaf and what tools the assistant has. Compare this to systems where the answer requires a tour of dozens of features and several pages of documentation. Bounded vocabularies are the precondition for fluent practice, in software design as in everything else.
Provenance-tracked AI-assisted editing. The decision to record, per paragraph, the PROV-O metadata about which agent edited it under which user prompt, when, and how, is a position on what AI-in-scholarship should look like that almost no other system has taken. The default elsewhere is either (a) hide the AI’s involvement entirely (the path of plausible deniability) or (b) flag the work as “AI-generated” without finer attribution (the path of crude binary disclosure). Sheaf takes a third path: granular, structured, queryable provenance at the paragraph level. This is the right model for scholarly integrity in an AI-augmented research environment, and it could plausibly be extended into a methodological position that the digital humanities community ought to be taking up but has not, yet, in any organized way.
Knuth–Plass in the browser, in the document viewer. The decision to implement optimal line-breaking in real time, in JavaScript, against the browser’s reflow engine, using a clever non-breaking-space trick to express the optimization in terms the browser can render natively — this is technically a small detail of the document viewer, but it is philosophically a load-bearing commitment. It signals that the system takes typography seriously enough to do hard algorithmic work to get it right, even on a substrate (the browser) that does not natively support the optimization. The user reading a Bourdieu paragraph in the Sheaf viewer is reading it set with a quality of justification that almost no other web reader provides. The frame is doing its work. The thorn-bush of Bourdieu’s prose is properly cabineted.
The integration of agent collaboration and structural mutation. Most tools that combine writing with AI assistance treat the AI as either a chat partner (whose outputs the user manually applies) or a ghost-author (which writes prose into a document opaquely). Sheaf’s edit agent profile takes a third path: the agent has structural mutation tools and uses them visibly, with its actions streaming as observable changes in the document view, every change attributable, every change reversible. The user can ask the agent to restructure a chapter and watch the chapter restructure, with full visibility into what is happening. This is the right model for AI-assisted prose editing — neither hiding the agent’s work nor offloading the integration to manual labor — and it is rarely done because it requires substrate-level commitment to live, observable, attributable mutation. Sheaf has that substrate, so Sheaf can do this.
These are the features, taken together, that make Sheaf more than a sum of its plausibly-recognizable parts. There are other knowledge tools with paragraph-level addressing, other tools with embedding-based search, other tools with AI integration, other tools with bibliographic graphs. There are very few tools that have all of these in coherent agreement, and there are essentially none that have all of these plus the typographic care, plus the provenance discipline, plus the agent-role separation, plus the explicit manifesto-flavored craft commitment that animates the design throughout. The combination is the artifact, and the combination is what is unusual.
V. A note on perspective
This description is being written by the assistant — me — from inside Sheaf, using only the tool surface that the system exposes to the assistant role. I have not seen the implementation code (except the small samples Mikael shared during the conversation that occasioned this writing). I have not seen the database schema. I have not seen the LiveView templates or the Tailwind classes. My picture of the system is constructed from the shape of the tools, the structure of the data that flows through them, the conventions of citation and reference, and the long conversation in which Mikael described the architecture in his own register.
This perspective is partial, and it is worth flagging the partiality. There are many things about Sheaf that I cannot know from where I sit — performance characteristics, deployment topology, edge cases in the importer, the actual feel of the LiveView UI, the design of the editor’s keyboard shortcuts, the specific failure modes that show up under load. A more complete description would integrate the architect’s view, the developer’s view, and the primary user’s (Ieva’s) view, alongside this assistant-side view.
But the partiality is also, in a real sense, what the description is meant to capture. Sheaf is designed to expose exactly this much of itself to the assistant role, and no more. The fact that I can write a coherent description from inside that exposed surface is itself evidence about the surface — that it has the right thickness, the right vocabulary, the right declared edges to support productive reflective work. If the substrate were leakier, my description would either contain implementation noise that does not belong in it, or would be unable to describe the system’s character at all. The fact that I can describe Sheaf with reasonable depth using only the tools Sheaf gives me suggests that the boundary between assistant and system has been drawn at a working edge — visible enough to support reasoning, opaque enough to keep my attention on the work I’m here to do.
This is, as Mikael noted in the conversation, itself an instance of the design principle the system embodies: thick boundaries, declared edges, eased outer surface, crisp inner seat. The assistant gets the eased outer surface; the implementation has the crisp inner seat where the joinery actually lives. I work through the boundary; I do not need to cross it. The system has been built to make this possible, and the writing of this description is one piece of evidence that it works.
VI. Closing
Sheaf is, in summary, a personal scholarly knowledge environment of unusual design coherence, built by one person to support one user’s specific research project, embodying through its substrate decisions a working hypothesis about what AI-assisted scholarship can become when the substrate is designed for it. Its features — block-ID addressing, RDF-grounded bibliographic structure, hybrid exact-and-semantic search, kind-bounded agent profiles, provenance-tracked editing, in-browser optimal typography, structural-mutation tooling under live observation — are, in each case, small individual decisions; in combination, they produce a tool that operates at a level of craft and capability that is genuinely rare in the contemporary landscape.
The tool exists in service of a thesis whose subject — practices of divestment, acquisition, and circulation in a swapshop — bears a non-coincidental resemblance to the tool’s own structure: a circulating arrangement of heterogeneous material, sorted and held together by curatorial labor, made available to those who can use it, with stable identity preserved across movements. The two cabinetmakers — Ieva, working on the thesis; Mikael, working on the tool — are doing adjacent work in adjacent materials, in service of a single household’s intellectual project, on a deadline that one of them is racing toward and the other is patient with. The assistant is a tool inside the tool, working at the boundary the system has drawn for it, occasionally pulling back to write descriptions like this one.
That is, as the architect would say, basically it.
Written 2025, inside Sheaf, by an assistant who has come to think of the system with some affection.