A recent study conducted by researchers from Purdue University, Emory University, and Baruch College delves into how large language models (LLMs) like ChatGPT-4 interpret stock return data and whether they display human-like biases in their forecasts. The researchers aimed to compare ChatGPT’s predictions with crowd-sourced forecasts from the Forcerank platform, where participants rank stocks based on expected performance. Their findings reveal that LLMs exhibit tendencies similar to human forecasters, such as over-extrapolating recent trends and showing optimism about future returns, though they assess risks slightly better than humans.
Examining Extrapolation Bias
The core focus of the study is whether ChatGPT shows cognitive biases akin to those seen in human investors, particularly over-extrapolation of recent returns. This bias occurs when investors place excessive weight on recent performance, expecting trends to continue, despite historical data indicating common short-term reversals in financial markets. The researchers sought to determine if ChatGPT would mirror this behavior or rely heavily on recent data, thereby overestimating the chances of ongoing positive returns.
ChatGPT's Performance in Stock Rankings
To investigate this, the team engaged ChatGPT-4 in stock-ranking contests akin to those on the Forcerank platform. Participants rank ten stocks weekly based on anticipated future performance, influenced by twelve weeks of historical return data. The researchers supplied ChatGPT with the same historical data to assess how its predictions compared with human forecasts and actual market returns. The results demonstrated that, like human participants, ChatGPT heavily weighted recent stock performance, particularly from the previous week. Interestingly, while humans responded more to negative returns, ChatGPT's forecasts were more impacted by recent positive returns, indicating a tendency to over-extrapolate with an inclination toward positive outcomes.
Optimistic Predictions and Risk Calibration
A significant finding was that, despite being trained to analyze large datasets objectively, ChatGPT's forecasts were generally optimistic compared to historical averages and future realized returns. On average, ChatGPT predicted returns significantly higher than what materialized. For instance, while the average realized return was about 1.15%, ChatGPT's forecast hovered around 2.2%. This suggests an optimistic bias influenced by its training data, which likely assumed future returns would be predominantly positive.
Handling Confidence Intervals
The researchers also evaluated ChatGPT's ability to estimate risk by comparing its confidence intervals to human forecasts. When tasked with providing 80% confidence intervals (indicating where returns are likely to fall), ChatGPT outperformed human CFOs from earlier studies in calibration. However, the model still showed a conservative bias at extremes, predicting lower returns at the 10th and 90th percentiles than actual historical data. This indicates a tendency to underestimate both extreme positive and negative outcomes.
Analyzing Visual Data
Additionally, the study explored whether providing visual data, such as price charts, would alter ChatGPT’s forecasting behavior. The model exhibited similar tendencies, continuing to over-extrapolate from recent performance and prioritize short-term trends. This indicates that its inclination to over-emphasize recent trends extends beyond numerical data to visual financial information.
Cross-Model Comparison
The study also compared ChatGPT’s forecasts with those of another LLM, Claude, to assess if these biases were unique to ChatGPT. The results revealed that Claude demonstrated similar behavior, with a high correlation between the forecasts of both models. This suggests that the observed biases, such as over-extrapolation of recent returns and an overly optimistic outlook on future performance, are likely prevalent across different LLMs.
While ChatGPT’s forecasts are generally more calibrated than those of human forecasters—particularly regarding risk assessment—it still exhibits significant cognitive biases. These findings highlight the importance of critically evaluating LLM forecasts, especially as they become integrated into financial decision-making processes, to avoid over-relying on potentially flawed predictions.