A comprehensive analysis of recent research has illuminated a nuanced and complex relationship between artificial intelligence and human creativity, highlighting both the impressive advancements of AI systems and their current, distinct limitations when compared to elite human ingenuity. The central subject of this analysis is a large-scale study that directly compared the creative performance of humans and leading AI models, providing one of the most definitive benchmarks to date on this evolving topic. The findings suggest that while AI, specifically models like OpenAI’s GPT-4, can now outperform the average person on standardized creativity tests, it consistently fails to reach the level of originality and variability demonstrated by the most creative humans. This distinction signals a critical boundary in the current capabilities of artificial intelligence, reassuring creative professionals that while AI has become a formidable tool, the pinnacle of innovative thought remains an exclusively human territory. The research provides a clear framework for understanding not just what AI can do but, more importantly, what it currently cannot.
A New Benchmark for Creativity
The primary investigation, conducted by researchers from the University of Montreal and published in Scientific Reports, represents the largest direct comparison between human and machine creativity. The study involved a massive cohort of 100,000 human participants from English-speaking countries, balanced for age and gender, who were pitted against nine of the world’s most advanced AI systems, including well-known large language models (LLMs) like GPT-4, Google’s GeminiPro, and Claude, as well as several lesser-known open-source models. The test used to measure creativity was the Divergent Association Task (DAT), a simple yet effective tool for gauging divergent thinking. Participants were asked to list ten words that are as semantically different from one another as possible. The creativity score was determined by the semantic distance between the words; a list like “microscope, volcano, whisper” would score significantly higher than a more conventional list such as “car, dog, tree,” as it demonstrates a greater cognitive range and originality in thought.
The results of this landmark comparison were striking and offered a dual perspective on the state of artificial creativity. On one hand, GPT-4, one of the most powerful models tested, achieved a score that surpassed the performance of the average human participant, firmly establishing its capability to generate creative ideas at a level beyond that of the general population. Google’s GeminiPro also performed admirably, matching the average human score, which indicates that modern AI has reached a point where it can effectively replicate and even exceed typical levels of creative thinking for this specific task. However, the study also exposed a clear ceiling to the AI’s capabilities. When the AI models’ best performances were compared against those of the top 10% of human participants, every single AI system fell short. This persistent gap underscores that the highest echelons of human creativity—the kind that produces truly novel and groundbreaking ideas—remain, for now, a uniquely human domain.
The Uncanny Valley of AI Originality
One of the most revealing discoveries of the study was the AI’s strong and unexpected tendency toward repetition, a behavior not commonly observed in humans performing the same task. Despite its high overall score, GPT-4 repeatedly used the same “safe” or high-probability words across its numerous responses. For instance, the word “microscope” appeared in an astonishing 70% of its generated lists, and “elephant” was present in 60% of them. The newer GPT-4-turbo model exhibited this tendency even more dramatically, including the word “ocean” in over 90% of its answers. This pattern suggests that the AI, while capable of identifying semantically distant concepts, has a limited pool of “creative” examples derived from its training data, leading to a form of predictable originality. This stands in stark contrast to the human participants, whose responses showed immense variability and a natural avoidance of repeating common ideas.
This vast difference in output patterns suggests a fundamental divergence in the creative process itself. The most common word submitted by any human participant was “car,” which appeared in only 1.4% of all human responses, followed by “dog” at 1.2%. This indicates that humans naturally introduce variability and intuitively understand that originality requires novel combinations and a departure from the obvious. AI, on the other hand, appears to default to statistically probable “creative” words it has learned are associated with high scores, creating a pattern that lacks the organic and unpredictable nature of human thought. The research team, led by Antoine Bellemare-Pepin and François Lespinasse, delved deeper into this issue, hypothesizing that this repetitive behavior could be mitigated by adjusting a key parameter in the AI models known as “temperature,” which essentially controls the randomness of the output.
Challenging Conventional Wisdom on AI Progress
By increasing the temperature setting for GPT-4, the researchers were able to significantly reduce its repetitive behavior and encourage more novel word choices. This adjustment had a direct and positive impact on its creativity scores, which jumped to a level higher than that of 72% of all human participants. This finding offers a practical takeaway for users of generative AI: to elicit more creative and less predictable responses, one can often adjust this parameter. However, it also reveals a critical insight into the nature of AI creativity. Unlike the inherent, emergent quality of human thought, AI creativity appears to be a configurable setting that can be turned up or down rather than an intrinsic capability for original ideation. It is a simulated process, not a spontaneous one, which explains its current limitations when faced with the need for truly groundbreaking concepts.
Furthermore, the study produced findings that challenge common assumptions about the linear development of artificial intelligence. First, newer does not always mean better in terms of creativity. The researchers found that GPT-4-turbo, released by OpenAI as an improvement upon GPT-4, performed significantly worse on the creativity test, suggesting that as models are updated, they may be fine-tuned for other qualities like speed, efficiency, or cost-effectiveness, potentially at the expense of creative performance. Second, the assumption that bigger models are inherently more creative was also debunked. The study included Vicuna, a smaller, open-source model, which managed to outperform several larger and more expensive commercial alternatives. This indicates that model architecture, training data, and fine-tuning play a more crucial role in creative tasks than sheer size alone.
The Enduring Value of Human Ingenuity
Beyond the simple word association test, the researchers also tasked the AI models with more complex creative writing assignments, such as composing haikus, movie synopses, and short fiction. While GPT-4 consistently demonstrated superior performance to its predecessor, GPT-3.5, in these tasks, the work produced by human writers still exhibited greater overall variety, originality, and nuance, particularly in poetry and plot development. This reinforced the core finding that while AI can master the structure of creative tasks, the substance of elite human creativity remains elusive. The study ultimately provided a clear guideline for the everyday user: to unlock more diverse outputs from AI tools, one should experiment with increasing the model’s randomness settings.
The overarching conclusion for creative professionals—artists, writers, designers, and innovators—was a reassuring one. The research confirmed that while AI has become a powerful tool capable of matching or exceeding average human output in certain creative domains, it cannot yet replicate the unique, top-tier ingenuity that defines the best human creators. Companies seeking to innovate and produce truly original work will likely continue to rely on these elite human minds. The gap between average and exceptional creativity is where human value remains indispensable. For the foreseeable future, AI has mastered the art of mimicking average creativity, but exceptional human creativity continues to exist in a class of its own.
