Due to MishMash, I am nowadays lecturing on AI, music, and creativity several times a week. I usually include a brief overview of machine learning history, mainly to explain that ChatGPT didn’t come out of nowhere but was the result of decades of research. To check that my story holds and to get a few more critical years and names in place. This blog post summarizes the brief history of AI to date.
Asking NotebookLM for help
To continue my explorations of NotebookLM, I decided to test its “deep search” function. Based on the search “history and development of machine learning”, it picked what it believes to be the 30 most important web pages/publications at the moment (check out the notebook). I also asked it to create an infographic based on the material:
NotebookLM’s illustrated timeline of machine learning. Note how it got the big picture right, but there are errors in the descriptions under the images.
Two paradigms: rule-based and learning-based
People often confuse the terms “artificial intelligence” (AI) and “machine learning” (“ML”). When I introduce AI, I typically argue that ML is one of two historically competing paradigms within AI:
- Rule-based systems, also called expert systems and symbolic artificial intelligence) focus on explicit rules, logic, and high-level reasoning. This is the “traditional” computing approach, with if–else statements. It requires explicit programming by humans.
- Learning-based systems, also called connectionist approaches emphasize learning from data and distributed representations. These are typically based on artificial neural networks that can learn from examples.
The tension between these two paradigms has shaped research priorities and led to cycles of enthusiasm and skepticism over the years. Let us revisit some of the key periods in AI development.
The Birth of AI (1940s–1950s)
The history of AI did not begin with complex computers, but with foundational ideas about how machines could imitate biological processes and human thought. This early period was defined by theoretical breakthroughs on the intersection of philosophy, psychology, and mathematics, laying the conceptual groundwork for an emerging discipline.
The first spark was a 1943 paper by Warren McCulloch and Walter Pitts, which proposed a mathematical model of a biological neuron. Their model treated the neuron as a simple binary unit capable of computation, establishing the first conceptual blueprint for an artificial neural network.
In his pivotal 1950 paper, “Computing Machinery and Intelligence,” British mathematician Alan Turing proposed what is now known as the Turing Test: a human evaluator conducts a text-only conversation with two participants, one human and one machine. The evaluator’s task is to determine which is the machine. If the machine convinces the evaluator that it is human (i.e., the evaluator cannot reliably distinguish it from a human), the machine passes the test. This experiment shifted the philosophical debate away from the unanswerable question, “Can machines think?” and toward a tangible, behavioral benchmark for measuring artificial intelligence.
The 1956 Dartmouth Summer Research Project on Artificial Intelligence is widely considered the official birthplace of AI as a formal field. The event’s organizers brought together leading researchers with a shared, ambitious belief:
“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
It was at this conference that computer scientist John McCarthy coined the term “artificial intelligence,” giving this emerging discipline a name and a mission.
In 1957, psychologist Frank Rosenblatt introduced the Perceptron, a simplified model of how neurons in the human brain work. This was the first system designed to recognize patterns and learn from data inputs. Unlike machines that relied solely on rigid programming, the Perceptron could improve its performance over time, opening the door to the modern era of machine learning.
An illustration of an artificial neuron and a perceptron.
Just a few years before the term “AI” was coined, IBM pioneer Arthur Samuel created a program in 1952 that could play checkers at a championship level. By having the program play against itself thousands of times, it learned to recognize patterns that led to victory, showcasing an early and influential form of machine learning.
Early Steps and a Great Divide (1960s–1970s)
The two decades following the Dartmouth Conference were marked by initial successes that captured the public’s imagination. However, this period also brought the field’s first major reality check, leading to disillusionment and a fundamental split in research directions.
Early programs demonstrated that computers could perform tasks once thought to require human intellect. At MIT, Joseph Weizenbaum developed ELIZA, one of the earliest chatbots. ELIZA simulated a psychotherapist by using simple pattern matching and substitution to reflect a user’s statements as questions. The program’s success gave rise to the “ELIZA effect,” a phenomenon where users attributed deep understanding and human-like intelligence to the machine, highlighting the persistent gap between perceived and actual intelligence.
In 1974, applied mathematician Sir James Lighthill published a critical report on AI research, claiming that researchers had over-promised and under-delivered on the potential of intelligent machines. This report, along with similar critiques, led to funding cuts and a period of stagnation known as the “AI winter.” This era, which began in the mid-1970s, was defined by the gap between AI’s ambitious expectations and the technology’s actual shortcomings.
During the late 1970s and early 1980s, a schism occurred between rule-based and learning-based approaches. Mainstream AI research began to focus on “expert systems,” which used logic and knowledge-based algorithms to mimic human decision-making. In this climate, machine learning branched off to evolve on its own, with researchers focusing on algorithms and statistical methods rather than symbolic reasoning.
As the limitations of rigid, rule-based systems became clear, the intellectual vacuum they created set the stage for a dramatic return to brain-inspired models.
The Connectionist Renaissance and a Key Breakthrough (1980s)
The 1980s marked a period of revitalization for the connectionist approach. This renaissance was sparked by the backpropagation algorithm. While the core idea was first introduced by Paul Werbos in 1974, it was popularized in 1986 by David Rumelhart, Geoffrey Hinton, and Ronald J. Williams.
The backpropgation algorithm works by adjusting weights in the network after each run.
Backpropagation provided an efficient way for multi-layer neural networks to learn. It worked by calculating the error in a network’s output and then “propagating” that error backward through the layers, adjusting the internal connection weights to improve accuracy. This breakthrough enabled the training of deep, complex networks, overcoming a major limitation of earlier models such as the Perceptron.
Though neural networks were now trainable, their “black box” nature frustrated many researchers, creating a demand for a new, more rigorous approach.
The Statistical Revolution (1990s–2000s)
In response to the often unpredictable, heuristic nature of 1980s neural networks, machine learning moved away from mimicking human thought and reorganized itself as a rigorous statistical discipline. The new focus was on developing data-driven methods with strong theoretical and mathematical foundations.
In 1997, the IBM supercomputer Deep Blue defeated world chess champion Garry Kasparov in a landmark match. This was the first time a machine had beaten a human expert at chess. Achieved through massive computational power, it proved that an AI system could surpass the best human expertise in a complex, strategic task.
While Deep Blue showcased the power of brute-force computation, another part of the machine learning community was developing mathematically elegant algorithms like Support Vector Machines. SVMs represented the peak of the statistical learning paradigm, operating by finding the optimal boundary, or “hyperplane,” that best separated data points into different categories. Unlike the “black box” neural networks of the 1980s, SVMs and other kernel methods came with strong theoretical guarantees about their performance.
The Deep Learning Tsunami (2010s)
The 2010s witnessed an explosion in AI capabilities. This “tsunami” was driven by the powerful convergence of three key factors: the availability of massive datasets, significant increases in computational power from hardware such as GPUs, and the development of advanced neural network architectures.
In 2012, a deep neural network named AlexNet won the annual ImageNet competition, a benchmark for computer vision. Its victory proved that deep learning could outperform traditional methods by a wide margin in image recognition. This success spurred a massive wave of investment and research into deep learning across industries.
A convolutional neural network (CNN) is a type of deep learning architecture that use convolutional layers to detect local patterns (edges, textures) and pooling/activation layers to build hierarchical feature representations.
The public witnessed another key event when Google DeepMind’s AlphaGo defeated Lee Sedol in the ancient and complex board game Go in 2016. Unlike chess, Go has an almost unimaginable number of possible moves and requires intuition and creativity, qualities long thought to be uniquely human and not solvable by brute-force computation.
Finally, in 2017, researchers published a landmark paper titled “Attention Is All You Need,” which introduced the Transformer architecture. Its core innovation was a mechanism called “self-attention,” which enabled a model to process an entire sequence of data (such as a sentence) at once, assigning relative importance to different words. This parallel processing capability made Transformers exceptionally efficient and scalable, directly enabling the creation of modern Large Language Models (LLMs).
The Generative AI Explosion and the Road Ahead (2020–)
The current era of AI is defined by the rapid rise of LLMs and generative systems capable of creating novel text, images, audio, and other media in response to prompts. The public release of OpenAI’s ChatGPT in November 2022 truly catapulted generative AI into the mainstream. Image and audio systems such as DALL·E and Suno show the potential of generative models, but they also expose persistent problems such as hallucinations, algorithmic bias, and large data and compute demands.
Some say we are at a crossroads, where we need to decide whether to continue scaling up or adopt alternative models that reason more like humans and can sense and act in the real world.
The current state of AI is not sustainable; we need to find an alternative path forward.
My approach to researching AI has been to combine perspectives from psychology and technology in my musicologically oriented research. This includes using advanced sensing systems to capture complex human actions and behaviors, as well as prototyping alternative interactive systems. Moving forward, I see many exciting pathways:
Neuro-symbolic AI. To overcome the “black box” limitations of current deep learning-based large models, it is time to bridge the gap between “old-school” expert systems and neural networks. This is key to creating “explainable AI” that can be used in high-stakes industries like healthcare and finance. Furthermore, combined approaches can help reduce the massive data and computational requirements of foundation models by integrating human domain knowledge into the training. Music is an excellent starting point for such development since traditional Western music theory is heavily rule-based.
Agentic AI. Today’s AI chatbots will be replaced by “agents” that can autonomously collect information and execute multi-step plans to achieve their goals. This requires recursive, parallel systems that can run in real-time and interact with other agents (both humans and machines). At RITMO, we conduct research into how psychological mechanisms such as “thinking fast and slow” can be developed into multi-layered machine systems.
Embodied AI. Historically, AI systems have been completely “disembodied”, focusing on mimicking human learning and reasoning from a (human) brain perspective. However, we now have substantial evidence that human cognition is inherently embodied and that there is a continuous action–perception cycle between the body and the brain. To advance AI, machines also need bodies with sensory apparatus and the ability to act. That is why we are conducting research into robotics at RITMO, exploring how machines can play music or dance with humans and with other machines.
Multimodal AI. Humans perceive the world through multiple modalities (vision, audition, olfaction, etc.), and we seamlessly combine these modalities to help us understand and act. Transformers have made it possible to combine various data types, but we are still just at the beginning of exploring how this works on messy real-world data. All my PhD fellows are currently working on projects to advance our understanding of multimodal AI.
Choosing the right tool for the job
There is massive public attention—and corporate investment—in LLMs at the moment. They are great for many things, and I continue to be amazed by what some of the big models can do. However, moving forward, we need to develop smarter, better, and more targeted tools. This is what we are working on at RITMO and will be exploring in MishMash.
I co-wrote this blog post based on a long discussion with NotebookLM. Grammarly helped with grammar checking.





