John Cayley: Rewriting the Future of Language in the Age of AI

scientificinquirer

9 months ago

John Cayley is a maker of language art in programmable media whose career has shaped the way we think about literature in the digital age. Since his early experiments with personal computing in the late 1970s and 1980s, Cayley has developed an expansive body of work encompassing poetry, translation, dynamic and ambient poetics, heuristic text generation, transliteral morphing, aestheticized reading vectors, and transactive synthetic language.

As a theorist, he has engaged deeply with the ontology of language in a world where computation mediates nearly all forms of cultural production, combining philosophically informed inquiry with practice-based research. Currently Professor of Literary Arts at Brown University, his books include Grammalepsy: Selected Essays on Digital Language Art and Image Generation (Counterpath Press), and his ongoing work can be explored at programmatology.com.

Over four decades, Cayley has been both a pioneer and a critic of digital poetics, influencing generations of artists and scholars. His creative processes often challenge conventional notions of authorship, embracing chance operations, formal constraints, and programmable transformations of language.

At the same time, his writing has addressed the deeper implications of technological change, from the poetics of animated text to the philosophical and political stakes of large language models. In his view, language is a lived, embodied human practice that cannot be reduced to modeled data—a position that grounds both his art and his critique of AI’s growing influence on culture.

In our exclusive Exploring the Nexus interview, Cayley traces his journey from early computer-assisted poetry to the present moment, where AI systems are reshaping how we read, write, and think. He discusses the promises and perils of computation that can now reprogram itself, the necessity of legible creative processes, and the risks of surrendering linguistic agency to opaque corporate systems.

It’s a rare conversation with one of digital literature’s most rigorous and imaginative voices—an invitation to reconsider the future of language, art, and human expression.

What first drew you to working with computers as a medium for poetic expression in the 1980s, long before digital writing was widely accepted as a literary practice?

While undertaking graduate study in the late 1970s my supervisor became obsessed with the potential for using newly accessible, to us, mainframe computation for the analysis of Chinese texts. This was my introduction to more or less real-world computation and it coincided with the advent of so-called ‘personal computing.’

I bought myself, in the United Kingdom, one of the first affordable personal computers, a BBC Micro, that was programmable in the Basic computer language. When a friend sent me a letter coded in the kind of acrostics where each letter of the message is replaced by a word beginning with that letter – a process explored by Emmett Williams, Jackson Mac Low and others – I realized that this process would be relatively straightforward to program: to get the personal computer to do the drudge work.

Then, for me, this opened up a new field of formal discovery and exploration with respect to literary language as recorded and subsequently digitized as text.

I felt justified, not to say inspired, to work with language in this way for at least two reasons: because I was aware that similar processes were being used to make valued work in other media – visual art and music in particular – and because I was averse to traditional notions of creativity and creative mastery. I preferred to subject my practice to both chance and formal operations as one way to avoid reinvestment in self-regard.

It’s also worth mentioning that some of my first efforts to apply computation to literary artifacts entailed a recognition that the screen is not paper. Text can be animated.

My first attempt to, as I later called it, ‘score the spelt air’ was undertaken on the BBC Micro in Basic, an animation of my own translation of a Chinese quatrain, ‘wine flying’ by Qian Qi. This kind of approach need have nothing to do with text generation or manipulation. It is simply an extension of existing poetic practice onto the more complex surface of a new support medium for text.

You helped pioneer the field of digital poetics. What is the state of digital poetics in the age of Large Language Models?

This is not as easy a question to answer as we might think because even today it is difficult to know what we mean by ‘digital poetics’ or specify which literary practices should be considered as subject to its characteristics.

The field currently known as Electronic Literature has matured in terms of both practice and theory, but its formation was significantly overdetermined in the 1990s by novel approaches to longform fiction and choose-your-own-adventure textual gaming (ludic reading). This has a poetics, if you like, but this is all but exhausted by hypertextuality, something with which we became familiar by other means, as the internet spread over us.

And at the same time, the poetics of all textual production began to be forever changed by a shift from typewriting to word processing and particularly by a different relationship to the archive and to sources for research. We no longer go to the texts we work from in books and libraries, they come to us, on the internet via databases. I take this to be a deep shift in the poetics of language making.

If we are applying ‘digital poetics’ to specifically poetic practice (which could as easily appear to be prose at the time of its reading) it is still hard to say how this has been integrated for artists at this moment of literary history. I’ve already complained that an easily understood aspect of poetic presentation – animated text – has been oddly neglected and its development would, historically, have been enabled by digital affordances, without question.

As far as the computational generation or manipulation of text is concerned, you may call this ‘digital’ in common usage. In the meantime, however, the adjective has been all but rendered meaningless due to the fact that, in the developed world, all culture is now digital: utterly dependent on digital ways of making or, even when ‘handmade,’ dependent on digital modes of dissemination and appreciation.

I now prefer to think of my practice as one of language art with computation. By doing so, I ignore the ‘digital,’ defer the prejudice of ‘literature,’ and recognize specifically that computation is integrated into my compositional and representational strategies.

When you ask this question “in the age of Large Language Models,” then I answer that none of the above changes, rather computation itself is changing. Not only will large language models interpose themselves, at the very least, to co-conduct all of our digitally implicated exchanges – that is, all our computation, in a perhaps exponential addition to this, models are really very good at creating or helping to create software.

That is, we now have computation that is good at computing and modifying itself – but according to who’s criteria?

In the past, you’ve created a range of literary modes such as subliteral orthographics and transliteral morphing. How do they hold up today?

These are both very specific processes that I’ve applied to quite carefully designed supply texts to experiment with different strategies for reading and, in the case of transliteral morphing, to move from one text to another in a manner that is significant and affective with respect to these transformations’ reading potential. They originated in ideas that occurred to me and suggested programmable processes and that I anticipated would generate interesting, hopefully literary or poetic effects.

In practice, I have made a number of pieces using these processes and then, after a time, I move on to other processes and new, formally unrelated work. In both these cases I think I will return to them.

Subliteral orthographics experiments with the replacement of letter forms that differ as little as possible. In some fonts, ‘t’ and ‘f’, for example differ only in that the top of ‘f’s vertical stroke curves down. And there are pairs of words with the same number of letters that have ‘f’ or ‘t’ in the same position such that the words differ, materially, by a subliteral difference, by less than a letter.

One such pair is ‘interiority’ and ‘inferiority.’ In the 2019 piece “hearing litoral voices / bearing literal traces”: subliteral narratives, made with Joanna Howard, Joanna created micro narratives from paired texts that exploit these subliteral differences. This is very much in a tradition of constrained writing as well as formal experimentation.

Transliteral morphing is a kind of interpolation based on translating, in textually correspondent positions, through a specially ordered sequence of letter forms. I first developed it for windsound (originally 1999, not currently viewable) and, in a modified form, it is a feature of my piece, translation, which now exists as a webapp. It is explained and demonstrated on my website.

How do these processes hold up today? Well, they were and are experiments, bespoke processes that I made for particular works. In both cases, other practitioners of language art with computation would be more than welcome to use them for their own works and adapt them to their own purposes.

In practice, it is quite hard to imagine others undertaking this and, it has to be said, processes of this nature are unlikely to gain any kind of literary currency unless a work that used them ‘caught fire’ and caused the kind of sensation that provokes imitators. Even then …

In “Modelit: eliterature à la (language) mode(l)” you discuss GPT image generation. What is the relationship between eLiterature and GPT image generation? Rather than a visual art, is it textual in nature?

In terms of models offering something novel, I do think that the generation of images in response to textual prompts represents an example of the type of transmedia cultural production that digital computation has afforded us in the post-war period.

To be clear, by transmedia, I mean that textual tokens are data taken from the medium of language and image tokens – the atoms of data derived from digitized images – are data taken from the medium of visual art.

Within the same model they are indistinguishable in terms of their substance, and they are all formal digital representations. Thus, allowing them to transgress transmedially on one another is something afforded fundamentally by ‘the digital’ and something that actual media can achieve only with difficulty.

Personally, I am not sure what to think about this kind of work other than to acknowledge that it has novelty. Myself, I try to remain focused on working within my chosen medium, language.

In that same essay, you suggest that writing produced by LLMs may sound good but they do not have actual “style” and that any “style” produced must be prompted. Why is this so and do LLMs have distinct voices (authorial or otherwise)?

Other commentators, notably N. Katherine Hayles, seem to disagree with me on this point by showing that large language models can be prompted to generate texts in different, recognizable or genre-related, styles. We are talking about style in two different senses or having different relationships with the operative agents in the model interaction.

The model has a snapshot of textual practice that is trained and constructed into a highly complex “database in the form of a neural network” (Aden Evens, The Digital and its Discontents, University of Minnesota, 2024, 189).

As such, it has – somewhere, somehow – digital representations of various ‘styles’ within the textual snapshot, including the personal styles of known authors, both represented and reinforced by critical discussion of all these styles in the same snapshot. Prompts can evoke these styles and suggest that they are mixed and blended, for example.

But I am taking note of the fact that the model itself does not have a style of its own or, if it seems to, this is a function of some combination of its interface software, its ‘value networks,’ and a fundamental statistical ‘averaging’ – often characterized as banality – in its responses. This is not a style in any sense that I would like to approve or imitate.

Ourselves, the user-transactors, on the other hand, all, necessarily, have our own style with respect to the way in which we use language and, in terms of embodiment, with respect to our vocal performance or of the way that we otherwise generate traces of our own language. At this level of style, the encounter between us and the models is one of stark contrast and how the styleless model affects the unique stylistic behaviors of its human transactors is an issue.

You’ve written critically about anticipatory interfaces and predictive text. What concerns you most about how AI systems are shaping our language and thought?

I think these concerns will come up a good deal in this and other answers. The main source of my concerns is the fact – admittedly complicated by the circumstance that large language models may be built from arbitrary amounts of data that are not strictly or purely linguistic – that AI systems model text, which we can think of as language-as-data.

Etymologically ‘data’ is what the world gives us – what it gives us as human animals, not necessarily what it gives us scientifically. In a now-global culture dominated by the twentieth century, especially post-war, understanding of science, we forget that what is now called ‘data’ is simply what we are able to capture by measurement from the world, and not, by any means, all of what the world gives us as human animals.

Computer Science treats text as if it were the real data of language, but I believe it is possible to show that text – particularly when decontextualized – is simply that part of language that we can ‘measure.’ Language itself is a shared, interpersonal human activity and practice that is much more than any textual snapshot, however comprehensive. It is an activity and practice that has made us what we are as animals (and scientists) and because of this, it continues to make and remake us. I like a distinction, attributed to Charles Taylor (The Language Animal, Harvard 2016), which sees language as constitutive of the human and our world, rather than designative. The ‘language’ of the models has been taken – by computer scientists, engineers, most linguistics, some philosophers – as designative. It is captured and represented as such by the models and their output. Strictly speaking, it is not language, it is not what we live with as language. And this is the chief source of my concerns.

How do you distinguish between language as lived, embodied experience and language as modeled data? What is at stake for you in that difference?

I already opened up a response to these questions in a reply above. The phrasing of the questions here allows me to take up, further, some of the ways that language is, yes, embodied and to point out that the consequences of this are not always appreciated.

On the other hand, no language is embodied in that what we read when we encounter language cannot be identified with any kind of ‘thing in the world.’ As Maurice Blanchot pointed out (‘Lire [Reading],’ 1955) we have to read a book to make it assert itself in our world. Otherwise, it just subsists as unread text. The ‘object’ of our reading is the reading itself.

Arguably, no reading takes place within the structures of data – of, that is, text – within the ‘language’ models. They do not create anything like the ‘objects’ that we create when we read or, indeed, use language in any other way. Even if we say that the models accept input, this is hardly reading and when they produce output, it is up to us to read it. Neither can we say that the model ‘having read’ – having taken in its textual corpora – is reading in the sense of creating an object that asserts itself (to whom?) as language.

On the other hand, our relationship to the reading that we do as part of everyday life is intimately embedded in our lived experience, our personal histories, and, very specifically, in the potential that we always have to ‘evocalize’ (the word is from the work of Garrett Stewart) whatever we read. The most obvious distinction between modeled text and what we, literally, read is this deep relationship between what we read and our ability to express that language in embodied gestures which for most of us are those of ‘voice.’ But I do not need to always mean physical voice here, since, for example, the linguistically expressive gestures of sign language actualize the same embodied relations.

What’s at stake? We are what we are and, linguistically, we are not formally modeled data

Do you see a continuity between AI predictive models and the older, more generative text systems you’ve used, or is there a qualitative rupture?

There is a qualitative rupture. I’ve already said a good deal that sets out reasons for my opinion. Any continuity is at the behest of those persons and forces that are reconfiguring computation itself. This reconfiguration is motivated by profit-lead efficacy and little else.

If computation changes comprehensively, I anticipate that it will, basically, sweep away or make redundant any computation that we might describe as heuristic and humanly relatable for the vast majority of users, other than in terms of transactive interfaces, and with most of these interfaces attuned to data extraction and friction free commerce.

You’ve raised concerns about the opacity of machine learning systems. How important is legibility—of process, not just text—in maintaining poetic or political agency?

I now make a distinction between processes in language art with computation that are hermetic and those that are heuristic. I owe the latter term to my longstanding collaborator, Daniel C. Howe. It indicates the kind of process that can be understood and appreciated by a human reader or critic and the kind of process the coding of which can be parsed and assessed.

Clearly this kind of close reading can be undertaken at various levels of detail and, as in all criticism, it is up to the critic to decide how closely to read, and what degree of closeness yields the most interpretative insight or, indeed, additional reading pleasure. This is only really possible in the case of heuristic processes, that you could, for example, reengineer and apply elsewhere for further discovery and making.

Despite ongoing research into what is called, by the researchers themselves, the ‘mechanistic interpretability’ of the models, it is still generally acknowledged that the processing they undertake at execution time is not understood. It is not interpretable in human terms although, of course, methods for engineering the elements and software structures – ‘layers,’ for example – of post-statistical systems are now the bread and butter of computer science.

These methods are ‘known’ and highly developed, increasingly, minute by minute it seems. So far, the hermeticism of ‘what they do’ remains. And when, having used them for a ‘task,’ you apply their processes to some other task, the ‘not knowing what has happened’ is multiplied and then compounded. Even if we project a time when ‘what they do’ is ‘(mechanistically) interpreted’ for us, something else has happened in the meantime. All the intervening, distinctly interpretable, parts of the process have been handed over to entirely digital processing in terms of entirely digital and formal representations. We have lost the ability to intervene in these processes at any point other than at the moments of prompt and response. Contrast heuristic processes. We can understand and read the design of their component parts. We can meaningfully intervene.

The situation is technical, both computationally and philosophically and clearly more complex than in this brief exposition. But I hope that anyone reading this may be able to appreciate that I am outlining another source of concern.

You’ve touched on the notion that AI doesn’t predict what I want to say – it teaches me to want what it predicts. Is this a true statement?

AI itself, the models themselves, do nothing. They have been constructed by their ‘training’ and are inert until prompted. Neither do they have any ‘idea’ of what we want to say. However, between us and the models themselves there are entire systems, of equal or greater complexity, which include other model-based neural networks, sometimes called ‘value networks.’ The ‘values’ in question (things like do not allow the model to promote DEI) are proposed by the system-builders. The ‘value networks’ are integrated with a great deal of bespoke filtering and interface software, much of which is intentionally programmed, by technicians working for the AI corporations, to manipulate both input, prompts, and output, responses.

What we want to say – or buy or do – is up to us, and will soon be up to us to delegate, or not, to an AI-driven ‘agent.’ What we learn is also up to us. It is difficult to exaggerate, however, the corporate-level conscious effort – of research and engineering – that is being put into manipulating our desires and ambitions as and when we transact with the models, and it is important to recall how much of these computationally automated transactions are explicitly designed by the AI corporations to yield a maximization of profits or funds for reinvestment in these same corporations. This is the real politique of capitalism and is widely accepted.

This said, I tend to fall back on a distinction made by the late philosopher of culture, Bernard Stiegler, who saw these kinds of technologies as what Plato called a pharmakon, a poison that could be render therapeutic. Chemotherapy is an everyday example. I do believe that contemporary AI is language poison, and I also believe that it can be rendered therapeutic (although not as an actual therapist!). Stiegler distinguished between adopting a technology and adapting to it. Adopting can be therapeutic, adapting is poison. It is difficult to adopt AI technologies because their ‘values’ – as described above – are always changing, and always changing, I would suggest, in a culturally negative direction, anticipating poor outcomes for all our practices of language, including those that are literary.

As an everyday example how to adopt more than adapt to AI, I would advise: Do not ask a model what you should do or say. Decide this yourself and then, if you really think this will help, ask the model for assistance with how to do or say what you’ve decided, or to find examples of other people who have made similar decisions.

Share this: