A groundbreaking collaborative study from several leading Canadian universities provides a meticulous and comprehensive analysis of artificial intelligence’s creative abilities, challenging both the unrestrained hype surrounding its potential and the pessimistic view that human ingenuity is on the verge of obsolescence. This research, detailed in the journal Scientific Reports, represents the most extensive direct comparison to date between the creative output of advanced AI models and a vast sample of human participants. The central theme emerging from the findings is a compelling paradox: while sophisticated AI like GPT-4 can now surpass the average person in specific, standardized creativity tasks, it demonstrably fails to match, let alone exceed, the performance of highly creative individuals, offering a more nuanced vision of our technological future.
The Core Experiment and Its Surprising Results
Measuring Creativity The Divergent Association Task
The investigation’s foundation was the Divergent Association Task (DAT), a cleverly designed method for quantifying one crucial aspect of creativity: divergent thinking. The task itself is simple in its instruction but complex in its cognitive demands, requiring participants to list ten words that are as semantically unrelated to each other as possible. Success in this test is not determined by the elegance or poetic quality of the words chosen, but by the calculated “semantic distance” between them—a mathematical measurement of the conceptual space separating one idea from another. For instance, a low-scoring set of words like “cat, dog, pet, animal” is penalized for its tight conceptual clustering within a single category. In contrast, a high-scoring set such as “galaxy, fork, freedom, algae, music” demonstrates a superior ability to leap across disparate mental categories, which is a hallmark of creative ideation. The researchers set up a large-scale confrontation, pitting a roster of major AI models, including the prominent GPT-4, Gemini, and Claude, against a massive, pre-existing dataset of over 100,000 human responses to the same task. This created an unprecedented opportunity to benchmark machine creativity against a statistically significant human baseline.
The sheer scale of the experiment allowed for a robust and detailed comparison, moving beyond anecdotal evidence or small-scale tests. By leveraging a dataset comprising more than 100,000 human participants, the researchers from the Université de Montréal, Concordia University, and the University of Toronto established a reliable spectrum of human creative performance on this specific task, from the least to the most divergent thinkers. Against this backdrop, the performance of the AI models could be precisely calibrated. The study wasn’t just about determining a winner; it was about understanding the contours and characteristics of AI-generated ideas in relation to human ones. The use of the DAT provided a quantifiable, objective metric, removing the subjectivity that often plagues assessments of creativity. This focus on semantic distance allowed for a direct, data-driven analysis of how far and wide an AI could “think” compared to its human counterparts, providing a clear window into the current state of machine-driven brainstorming and its place in the landscape of creative thought.
A Tale of Two Tiers AI vs Human Performance
The primary finding of the study was immediately striking and headline-worthy: on average, the GPT-4 model outperformed the entire human sample. Its mean score on the Divergent Association Task was statistically higher than the average score achieved by the more than 100,000 human participants. Other models, such as GeminiPro, performed at a level that was statistically indistinguishable from the human average. This result powerfully confirms that artificial intelligence has effectively “raised the floor” for this type of divergent thinking. It is now capable of generating a list of conceptually varied ideas that are, on average, more diverse than those produced by a typical person engaged in a brainstorming exercise. This suggests that for baseline ideation tasks, AI can serve as a remarkably effective tool, consistently producing a level of creative variety that exceeds the norm for the general population. This capability alone has significant implications for fields that rely on initial idea generation, from marketing to product development, positioning AI as a new standard for average-level creative output.
However, this impressive average performance is heavily qualified by a crucial detail that emerged when the researchers analyzed the data based on performance tiers. When looking beyond the mean scores, the hierarchy of creativity completely inverted. The most creative 50% of human participants scored higher on the DAT than every single AI model that was tested, including the top-performing GPT-4. This performance gap widened substantially when the analysis focused on the elite top 10% of human performers, who significantly outclassed the AI’s best efforts. This critical finding indicates that while AI has successfully mastered a form of average-level brainstorming, it has not yet managed to break through the ceiling of peak human imagination. The study paints a picture not of AI surpassing human creativity outright, but of a specific dynamic where AI excels at the median while the best of human ingenuity remains unmatched. The highest echelons of divergent thinking, characterized by truly unexpected and profound conceptual leaps, remain a distinctly human domain.
Unpacking the Differences in Creative Processes
The Ocean Problem AI’s Repetitive Nature
Further investigation exposed fundamental, almost alien, differences in the cognitive processes of AI when compared to humans. A key issue, which the researchers dubbed the “Ocean Problem,” revealed the AI’s persistent tendency toward repetitive and surprisingly formulaic responses, even when it was explicitly prompted for maximum creativity and diversity. An analysis of the specific words generated by the models was particularly telling. GPT-4, for instance, exhibited a peculiar fixation on certain words, including “microscope” in an astonishing 70% of its generated lists and “elephant” in 60% of them. An even newer model, GPT-4-turbo, demonstrated an even more pronounced lack of variety, using the word “ocean” in over 90% of its attempts at the DAT. This machine-like insistence on a narrow set of “creative” words stands in stark contrast to the immense diversity seen in the human responses. The most common words submitted by people were “car” and “dog,” but each of these appeared in only about 1% of the total human-generated lists.
This vast disparity suggests that what appears as creativity in an AI model may often be a probabilistic artifact rather than genuine ideation. The models, lacking consciousness or lived experience, are not “thinking” of new ideas in the human sense. Instead, they are navigating an unimaginably vast statistical landscape built from their training data. The “Ocean Problem” implies that these models repeatedly return to the same “random” yet statistically prominent corners of this data space that their algorithms have identified as being highly associated with creative or divergent concepts. This results in a form of pseudo-creativity that can pass a standardized test but lacks the authentic, idiosyncratic variety that stems from the unique tapestry of individual human memory, experience, and association. The AI’s creativity is, in this sense, an echo of its training data, whereas human creativity is a reflection of a life.
Creativity on Command Tuning the AI
The study also demonstrated that an AI’s creative output is not an innate, fixed quality but a tunable and highly malleable variable. Researchers were able to directly manipulate the AI’s performance on the DAT by adjusting a key operational parameter known as “temperature.” This setting essentially controls the level of randomness or risk in the AI’s word selection process. A low temperature setting prompts the model to produce predictable, safe, and high-probability responses, which are often coherent but unoriginal. Conversely, a high temperature encourages the model to select less likely and more unconventional words, leading to outputs that are more diverse and potentially more creative—though also at a higher risk of being nonsensical or irrelevant. The researchers found that by “cranking up” the temperature setting to its highest levels, they could significantly boost the AI’s DAT scores. Under these high-risk conditions, GPT-4’s performance improved to the point where it could outperform approximately 72% of the human participants, a notable increase from its baseline.
This finding, along with the discovery that specific instructions—such as prompting the AI to adopt an “etymology strategy” to find unrelated words—also improved its scores, highlights a critical distinction between human and machine creativity. The creativity of an AI is not an intrinsic characteristic but a configurable output that is heavily dependent on its operational parameters and the precise guidance provided by the user. This suggests that interacting with a creative AI is less like collaborating with another mind and more like operating a complex instrument. The quality of the output is a direct function of the user’s skill in setting the right parameters and crafting effective prompts. This puts the locus of ultimate creative control firmly back in human hands, framing AI not as an autonomous creator but as a powerful, sophisticated tool whose potential is unlocked through intelligent human direction and intervention.
From Brainstorming to Storytelling
The Limits of AI in Complex Writing Tasks
The investigation did not confine itself to simple word lists but extended into more complex and holistic creative writing assignments, including the composition of haikus, movie plot summaries, and short works of flash fiction. It was in these more nuanced areas that the limitations of the current generation of AI became even more apparent. While the models were fully capable of generating grammatically correct and coherent text, human writers consistently achieved higher scores on a metric called “Divergent Semantic Integration.” This advanced measure assesses the ability to not only generate diverse concepts but to skillfully weave them together into a unified, resonant, and compelling narrative. A visual analysis of the semantic content produced revealed that human writing inhabited a completely different “region of meaning” from the machine-generated text. This indicated a deeper level of thematic and conceptual integration, where disparate ideas were not merely listed but were synthesized into a meaningful whole, a feat the AI struggled to replicate with the same level of sophistication.
A particularly telling result emerged from the haiku generation task. In stark contrast to the longer story and plot summary assignments, the researchers found that increasing the AI’s “temperature” setting did not improve the quality of its haikus. In fact, it often made them worse, producing nonsensical or disjointed verses. This implies that short, highly constrained poetic forms demand a level of intentionality, nuance, structural awareness, and subtle wordplay that current statistical prediction models cannot effectively replicate. The art of a haiku lies in the careful, deliberate selection of a few words to evoke a specific image or feeling—a process that appears to be beyond the reach of an algorithm designed to predict the next most probable word in a sequence. This failure to master constrained forms underscores that true creative writing involves more than just divergent thinking; it requires a convergent, focused intelligence to imbue words with meaning and structure them into something that resonates on an emotional and intellectual level.
A Collaborative Future AI as a Creative Tool
The study’s findings ultimately offered a more optimistic and collaborative vision of the future, steering clear of any declaration of an existential threat to artists and writers. The research provided a crucial reality check, demonstrating that today’s artificial intelligence is not a direct replacement for good art but is better understood as a powerful tool that can effectively augment human creativity. Its proven excellence at baseline ideation means it can serve as a potent instrument to help creators overcome initial hurdles, such as the proverbial writer’s block or a simple lack of starting ideas for a project. The ability of a model like GPT-4 to instantly generate a diverse list of concepts, on average better than a typical person could, presents a significant resource for jump-starting the creative process.
By efficiently handling the “average” part of creative work—the initial brainstorming and generation of raw material—AI could free up human artists, writers, and musicians to concentrate on the more demanding aspects of their craft. This would allow them to focus their energy on achieving the peak levels of originality, emotional depth, and narrative coherence that machines, for now, cannot touch. The robot may have suggested “ocean” and “microscope” with startling frequency, but the distinctly human task remained to imbue those concepts with personal meaning, to weave them into a story that moves an audience, and to create something that resonates with the shared human experience. This collaborative model positions AI not as a competitor, but as a foundational assistant in the ongoing pursuit of human expression.
