AI chatbots have captivated the tech industry, triggering an AI race among major tech companies. OpenAI’s popular creation, ChatGPT, which kicked off this wave, is now facing performance fluctuations that have raised concerns about possible burnout.
According to a study carried out by researchers at Stanford University, ChatGPT’s performance has been examined over a span of several months, focusing on tasks like solving math problems, answering sensitive questions, generating software code, and visual reasoning. The study discovered significant variations, known as drift, in the chatbot’s ability to handle these tasks.
Comparing two versions of OpenAI’s platform, GPT-3.5 and GPT-4, the study highlighted some unexpected findings. While GPT-4’s performance in solving math problems sharply declined over three months, dropping from 97.6% accuracy to a mere 2.4% in identifying 17077 as a prime number, GPT-3.5 showed an opposite trajectory, improving from 7.4% accuracy to a consistent 86.8%.
One of the major challenges researchers face in comprehending these performance fluctuations lies in the black box dilemma. OpenAI’s decision to keep ChatGPT’s code closed source limits transparency and obstructs understanding of the underlying complexities of the model.
Additionally, the study revealed that ChatGPT’s ability to provide reasoning behind its answers has also diminished over time. In March, the chatbot would offer step-by-step explanations, but by June, it stopped doing so without providing any clear reasoning.
James Zuo, a study author and Stanford computer science professor, emphasized the unintended consequences of adjusting large language models. Modifications aimed at improving specific tasks can have adverse effects on others due to intricate interdependencies in the model’s responses, which are not well understood due to its closed-source nature.
Recently, there has been a decrease in ChatGPT’s website traffic, with a surprising 9.7% drop in June compared to May. The decline in unique visitors by 5.7% and the reduction in time spent on the site by 8.5% suggest a possible decrease in user engagement. Some experts speculate that the initial excitement surrounding ChatGPT may be fading, while the release of the iOS app in May may have caused a shift in user traffic to the more accessible mobile application.
As the story of ChatGPT continues, it is evident that AI chatbots face unique challenges in maintaining consistent performance levels while navigating complex interdependencies within their models.