Artificial intelligence has pervaded human imagination since antiquity. The Greeks wrote about statues produced by men who “discovered the true nature of the gods” and mechanical men produced in ancient China that could “walk with rapid strides”. The Renaissance saw an unprecedented explosion of mathematical and scientific ideas fly across Europe in the so-called Age of Reason. Philosophers like Leibniz envisioned a system where the process of human thought could be made as mechanical and systematic as algebra and geometry. These ideas all came to fruition in the 40s when the structure of the brain was shown to be an electrical network of neurons connected to form circuits capable of processing information.
Inspired by this work, Walter Pitts and Warren McCulloch envisioned the first artificial neural network in 1943. To mimic the neurons in our own brains, an artificial neuron was designed that took inputs mimicking the action of the dendrites and produced an output representing the neuron’s action potential. In the pre-digital age, the first neural networks were born. The Mark I Perceptron, built in 1958, was designed to recognize images. It had an array of 400 photocells connected to a few dozen neurons. Weights were encoded in potentiometers and learning was performed by electric motors.
Despite early optimism, early neural networks ran into a multitude of problems. Whilst the Perceptron could deal with simple shapes, it could not recognise more complex patterns due to the simplicity of the network. It was later discovered in 1969 that a “one-layer” perceptron was incapable of even processing a XOR (exclusive-or) circuit, an example of a digital logic gate. In addition to this, modern computer vision programs require trillions of calculations to be performed every second whereas the state of the art at the time that sent Apollo 11 to the moon was only capable of 34,500 a second. As a result, work on neural networks stagnated until two new ideas arose in the 70s – hidden layers and backpropagation.
The failure of the early perceptron was largely due to the structure of the network. The outputs the perceptron produced were simply linear combinations of the inputs from the photocells. As such, functions like exclusive-or, which were not linearly separable, could not be expressed. The solution was hidden layers, connected layers of neurons between the input and output. Hidden layers allowed a network to find features like edges and shapes within the image rather than operate on the raw data. For example, in digit recognition, a network may first locate and identify the holes and curves within a number in the hidden layer and then pass that forward onto the next layer.
Naturally with a more complex network, it became more difficult to “train” or allow it to learn efficiently. Due to its structure, intensive calculations had to be done to adjust a connected interdependent network appropriately. The solution to doing this efficiently came with backpropagation. Designed to imitate “a backwards flow of credit assignment, flowing back from neuron to neuron” the algorithm enabled the development of increasingly more complex networks. This culminated in the modern “multilayer perceptron” that finds use today in; speech and image recognition and machine translation. The Universal Approximation Theorem proved in 1989 by George Cybenko showed that a multilayer perceptron could approximate almost any function with arbitrarily high precision.
Modern neural networks enabled by the vast computational power we have today have their roots in the networks developed in the 80s. Three convolutional neural networks were used in the program AlphaGo that famously beat Lee Sedol in 2016, which are a variation on the multilayer perceptron designed to exploit spatial invariance in recognition. An active area of study in industry and academia today is the recurrent neural network, an artificial neural network where data is propagated forwards as well as backwards allowing dynamic temporal behaviour enabling advanced speech and handwriting recognition.
The pace of progress in artificial intelligence in the past decade has defied imagination, with records shattered one after the other. The program AlphaGo was itself beaten in 2017 100-0 by AlphaGo Zero with less than a tenth of the power consumption using recent developments in reinforcement learning. Although AI has found novel use in industry from classifying skin cancers to the same precision as a board of 21 dermatologists and enabling autonomous vehicles to drive with fewer accidents per mile than humans, the fundamental question still remains intractable.
The scenarios described here are all examples of “supervised” learning, where an algorithm learns from vast amounts of labelled examples. General purpose unsupervised learning – the Holy Grail of artificial intelligence – where an agent can simply learn from experience without human intervention is one of the least understood areas of computer science. Unlocking those secrets would unlock the sentient men from ancient Greek mythology and define what it means to be human.