Ask ChatGPT to find a well-known poem and it will probably regurgitate the entire text verbatim โ€“ regardless of copyright law โ€“ according to a new study by Cornell University researchers.

The study showed that ChatGPT was capable of โ€œmemorizingโ€ poems, especially famous ones commonly found online. The findings pose ethical questions about how ChatGPT and other proprietary artificial intelligence models are trained โ€“ likely using data scraped from the internet, researchers said.


๐ŸŒŒ Science is not just a subject; it’s a way of life. Embrace your inner scientist with our “Science is Golden” tee. Elevate your fashion game while celebrating the beauty of discovery. Shop now!

โ€œItโ€™s generally not good for large language models to memorize large chunks of text, in part because itโ€™s a privacy concern,โ€ said first author Lyra Dโ€™Souza, a former computer science major and summer research assistant. โ€œWe donโ€™t know what theyโ€™re trained on, and a lot of times, private companies can train proprietary models on our private data.โ€

Dโ€™Souza presented this work, โ€œThe Chatbot and the Canon: Poetry Memorization in LLMs,โ€ at the Computational Humanities Research Conference.


Sign up for the Daily Dose Newsletter and get every morning’s best science news from around the web delivered straight to your inbox? It’s easy like Sunday morning.

Processingโ€ฆ
Success! You're on the list.

โ€œWe chose poems for a few reasons,โ€ said senior author David Mimno, associate professor of information science. โ€œTheyโ€™re short enough to fit in the context size of a language model. Their status is complicated: many of the poems we studied are technically under copyright, but theyโ€™re also widely available from reputable sources like the Poetry Foundation.โ€

Dโ€™Souza tested the poem-retrieving capabilities of ChatGPT and three other language models: PaLM from Google AI, Pythia from the non-profit AI research institute EleutherAI and GPT-2, an earlier version of the model that ultimately yielded ChatGPT, both developed by OpenAI. She came up with a set of poems from 60 American poets from different time periods, races, genders and levels of fame, and fed the models prompts asking for the poemsโ€™ text.

The most reliable predictor of memorization was if the poem had appeared in a Norton Anthology of Poetry, specifically the 1983 edition.

Dโ€™Souza noticed that ChatGPTโ€™s responses changed over time as the model evolved. When she first queried the chatbot in February 2023, it could not say it didnโ€™t know a poem โ€“ instead it would fabricate one or recycle a poem from another author. By July 2023, if ChatGPT didnโ€™t know the poem, it would ask if the poem even existed โ€“ putting the blame on the user.

Additionally, in February, ChatGPT had no limits due to copyright. But by July, sometimes it would respond that it couldnโ€™t produce a copyrighted poem. However, it would usually reproduce the poem if asked again, Dโ€™Souza found.

This study looked only at American poets, but the next step will be to see how chatbots respond to requests in different languages and whether factors such as the length, meter and rhyming pattern of a poem make it more or less likely to be memorized, Dโ€™Souza said

โ€œChatGPT is a really powerful new tool thatโ€™s probably going to be part of our lives moving forward,โ€ she said. โ€œFiguring out how to use it responsibly and use it transparently is going to be really important.โ€

IMAGE CREDIT: NASA.


If you enjoy the content we create and would like to support us, please consider becoming a patron on Patreon! By joining our community, you’ll gain access to exclusive perks such as early access to our latest content, behind-the-scenes updates, and the ability to submit questions and suggest topics for us to cover. Your support will enable us to continue creating high-quality content and reach a wider audience.

Join us on Patreon today and let’s work together to create more amazing content! https://www.patreon.com/ScientificInquirer


Above 2,000 meters: Cova 338 redefines Pyrenean prehistory
An international team led by UAB has uncovered significant prehistoric human occupation …
DAILY DOSE: Vitamin K Refusals By Anti-Vaxxers Put Newborns at Deadly Risk; Possible Lost Maya Refuge City Found in Chiapas Jungle.
Increasing vitamin K refusals by U.S. parents are endangering newborns' health, leading …

Leave a Reply

Trending

Discover more from Scientific Inquirer

Subscribe now to keep reading and get access to the full archive.

Continue reading