Zephyrnet Logo

Imagine this: Creating new visual concepts by recombining familiar ones

Date:

Around two and a half thousand years ago a Mesopotamian trader gathered some clay, wood and reeds and changed humanity forever. Over time, their abacus would allow traders to keep track of goods and reconcile their finances, allowing economics to flourish.

But that moment of inspiration also shines a light on another astonishing human ability: our ability to recombine existing concepts and imagine something entirely new. The unknown inventor would have had to think of the problem they wanted to solve, the contraption they could build and the raw materials they could gather to create it. Clay could be moulded into a tablet, a stick could be used to scratch the columns and reeds can act as counters. Each component was familiar and distinct, but put together in this new way, they formed something revolutionary.

This idea of “compositionality” is at the core of human abilities such as creativity, imagination and language-based communication. Equipped with just a small number of familiar conceptual building blocks, we are able to create a vast number of new ones on the fly. We do this naturally by placing concepts in hierarchies that run from specific to more general and then recombining different parts of the hierarchy in novel ways.

But what comes so naturally to us, remains a challenge in AI research.

In our new paper, we propose a novel theoretical approach to address this problem. We also demonstrate a new neural network component called the Symbol-Concept Association Network (SCAN), that can, for the first time, learn a grounded visual concept hierarchy in a way that mimics human vision and word acquisition, enabling it to imagine novel concepts guided by language instructions.

Our approach can be summarised as follows:

  • The SCAN model experiences the visual world in the same way as a young baby might during the first few months of life. This is the period when the baby’s eyes are still unable to focus on anything more than an arm’s length away, and the baby essentially spends all her time observing various objects coming into view, moving and rotating in front of her. To emulate this process, we placed SCAN in a simulated 3D world of DeepMind Lab, where, like a baby in a cot, it could not move, but it could rotate its head and observe one of three possible objects presented to it against various coloured backgrounds – a hat, a suitcase or an ice lolly. Like the baby’s visual system, our model learns the basic structure of the visual world and how to represent objects in terms of interpretable visual “primitives”. For example, when looking at an apple, the model will learn to represent it in terms of its colour, shape, size, position or lighting.

Source: https://deepmind.com/blog/article/imagine-creating-new-visual-concepts-recombining-familiar-ones

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?