Increasingly powerful AI models can make short-term weather forecasts with surprising accuracy. But neural networks only predict based on patterns from the pastโwhat happens when the weather does something thatโs unprecedented in recorded history? A new study led by scientists from the University of Chicago, in collaboration with New York University and the University of California Santa Cruz, is testing the limits of AI-powered weather prediction. In research published May 21 in Proceedings of the National Academy of Sciences, they found that neural networks cannot forecast weather events beyond the scope of existing training dataโwhich might leave out events like 200-year floods, unprecedented heat waves or massive hurricanes.
This limitation is particularly important as researchers incorporate neural networks into operational weather forecasting, early warning systems, and long-term risk assesments, the authors said. But they also said there are ways to address the problem by integrating more math and physics into the AI tools.
โAI weather models are one of the biggest achievements in AI in science. What we found is that they are remarkable, but not magical,โ said Pedram Hassanzadeh, an associate professor of geophysical sciences at UChicago and a corresponding author on the study. โWeโve only had these models for a few years, so thereโs a lot of room for innovation.โ
Weather forecasting AIs work in a similar way to other neural networks that many people now interact with, such as ChatGPT.
Essentially, the model is โtrainedโ by feeding it a bunch of text or images into a model and asking it to look for patterns. Then, when a user presents the model with a question, it looks back at what itโs previously seen and uses that to predict an answer.
In the case of weather forecasts, scientists train neural networks by feeding them decadesโ worth of weather data. Then a user can input data about the current weather conditions and ask the model to predict the weather for the next several days.
The AI models are very good at this. Generally, they can achieve the same accuracy as a top-of-the-line, supercomputer-based weather model that uses 10,000 to 100,000 times more time and energy, Hassanzadeh said.
Sign up for the Daily Dose Newsletter and get every morning’s best science news from around the web delivered straight to your inbox? It’s easy like Sunday morning.
โThese models do really, really well for day-to-day weather,โ he said. โBut what if next week thereโs a freak weather event?โ
The concern is that the neural network is only working off the weather data we currently have, which goes back about 40 years. But thatโs not the full range of possible weather.
โThe floods caused by Hurricane Harvey in 2017 were considered a once-in-a-2,000-year event, for example,โ Hassanzadeh said. โThey can happen.โ
Scientists sometimes refer to these events as โgray swanโ events. Theyโre not quite all the way to a black swan eventโsomething like the asteroid that killed the dinosaursโbut they are locally devastating.
The team decided to test the limits of the AI models using hurricanes as an example. They trained a neural network using decades of weather data, but removed all the hurricanes stronger than a Category 2. Then they fed it an atmospheric condition that leads to a Category 5 hurricane in a few days. Could the model extrapolate to predict the strength of the hurricane?
The answer was no.
โIt always underestimated the event. The model knows something is coming, but it always predicts itโll only be a Category 2 hurricane,โ said Yongqiang Sun, research scientist at UChicago and the other corresponding author on the study.
This kind of error, known as a false negative, is a big deal in weather forecasting. If a forecast tells you a storm will be a Category 5 hurricane and it only turns out to be a Category 2, that means people evacuated who may not have needed to, which is not ideal. But if a forecast underestimates a hurricane that turns out to be a Category 5, the consequences would be far worse.
Hurricane warnings and why physics matters
The big difference between neural networks and traditional weather models is that traditional models โunderstandโ physics. Scientists design them to incorporate our understanding of the math and physics that govern atmospheric dynamics, jet streams and other phenomena.
The neural networks arenโt doing any of that. Like ChatGPT, which is essentially a predictive text machine, they simply look at weather patterns and suggest what comes next, based on what has happened in the past.
No major service is currently using only AI models for forecasting. But as their use expands, this tendency will need to be factored in, Hassanzadeh said.
Researchers, from meteorologists to economists, are beginning to use AI for long-term risk assessments. For example, they might ask an AI to generate many examples of weather patterns, so that we can see the most extreme events that might happen in each region in the future. But if an AI cannot predict anything stronger than what itโs seen before, its usefulness would be limited for this critical task. However, they found the model could predict stronger hurricanes if there was any precedent, even elsewhere in the world, in its training data. For example, if the researchers deleted all the evidence of Atlantic hurricanes but left in Pacific hurricanes, the model could extrapolate to predict Atlantic hurricanes.
โThis was a surprising and encouraging finding: it means that the models can forecast an event that was unpresented in one region but occurred once in a while in another region,โ Hassanzadeh said.
Merging approaches
The solution, the researchers suggested, is to begin incorporating mathematical tools and the principles of atmospheric physics into AI-based models.
โThe hope is that if AI models can really learn atmospheric dynamics, they will be able to figure out how to forecast gray swans,โ Hassanzadeh said.
How to do this is a hot area of research. One promising approach the team is pursuing is called active learningโwhere AI helps guide traditional physics-based weather models to create more examples of extreme events, which can then be used to improve the AIโs training.
โLonger simulated or observed datasets aren’t going to work. We need to think about smarter ways to generate data,โ said Jonathan Weare, professor at the Courant Institute of Mathematical Sciences at New York University and study co-author. โIn this case, that means answering the question ‘where should I place my training data to achieve better performance on extremes?’ Fortunately, we think AI weather models themselves, when paired with the right mathematical tools, can help answer this question.โ
University of Chicago Prof. Dorian Abbot and computational scientist Mohsen Zand were also co-authors on the study, as well as Ashesh Chattopadhyay of the University of California Santa Cruz.
The study used resources maintained by the University of Chicago Research Computing Center. A video explaining the findings can be found here.





Leave a Reply