Neuroscience does not yet have a good model for how 3D localization is performed in animals. Those models that do exist depend on mathematics that cannot be implemented in biologically plausible ways, and plausible models explain only a small part of animate capabilities. On the other hand, engineers using the mathematics of the “implausible” models have recently made significant progress in constructing artificial systems that successfully emulate not simulate the 3D localization abilities we observe in animals.

However, these systems still need new engineering approaches to allow real-time performance in natural scenes: something that will not be provided by advances in computational power over the next twenty years. For both fields, an implementation that can work on highly parallel architectures of relatively slow processing elements would be a significant advance. In turn, a synergistic approach to solving the problem will have spin-off benefits in both areas.

There are scientific opportunities here in several directions. First is simple intellectual curiosity: it is mathematically interesting to try to make artificial systems that are constrained to be biologically plausible, a task for which the expertise of both life and physical scientists will be needed.

Second, biologists can use the engineered solutions to make predictions of accuracy, as benchmarks, and as testbeds, as well as an “existence proof” that a data-driven solution is possible without recourse being needed to higher-level reasoning. This has important implications for the level of “cognition” required for 3D perception. Finally, of course, it will be a basic technological benefit to have artificial systems that can reason in 3D.

 
 

Systems that can move, whether animals or robots, need to keep track of where things are in 3D. Animals from ants to humans maintain a representation of the world that is sufficient to allow homing or pointing to unseen objects, even when the animal has moved significantly since the object was last observed.

In order to point to a remembered object, the brain must associate a representation of 3D position with the object, and update this representation as the head and body move in 3D. To date, we do not have a complete model for how this is achieved, even in insects, and certainly not in humans. On the other hand, we have recently seen great advances in the emulation of this ability in artificial systems.

 
State of the art—biology  
 

Satisfactory neural models of 3D localization are the subject of active research by groups worldwide (indeed, the UK has a strong international reputation in this area, with groups at Oxford, Reading, Surrey, Sussex, and Newcastle, among others). The difficulty with current models is balancing scope and biological plausibility. A particular example is in maintaining a 3D representation of objects using vision. Existing models of the case where the observer undergoes large translations.

For example: crossing a room—depend on matrix algebra and exact geometry, for which it is very difficult to see a plausible neuronal implementation. Existing models of transformations between head, eye, and motor coordinate systems beg the question of how and whether such coordinate systems are really distinguished in the brain. It is thus an open question to find a biologically plausible model of 3D localization in animals.

 
State of the art—engineering  
 

On the engineering side, significant advances have been made in the last decade in building reliable systems that can derive 3D information from vision.

These systems are built on a combination of biologically inspired processing stages, and share their mathematical basis with some current biological models. Advances both in these mathematical tools and in the statistics of vision were instrumental in the success.

However, while significant advances have been made, there remain several thorny problems. One problem is that systems that operate in real-world environments have a computational requirement that depends on the complexity of the scene. If Moore’s law contributes a thousand-fold increase in compute power over 20 years, only a tenfold increase in the complexity of 3D scenes will be tractable.

New approaches are needed: for forgetting old data, or to provide alternative mathematical models, including those that will be required for a neural implementation.

 
Existence proofs and “cognitive” systems  
 

Traditionally, the life sciences have informed the design of artificial systems on two levels. Successful artificial systems are often built on explicit biological inspiration.

The development of edge detectors in computer vision was largely based on similar receptors found in animal visual systems. A more fundamental inspiration, however, has been existence proofs: without the evidence that the human system can form a 3D representation of the world from vision, it is doubtful that AI research would have attempted do the same.

The availability now of systems that can reliably emulate aspects of the animal systems provides existence proofs of another kind to the life sciences.
These are proofs that certain tasks performed by the brain can be achieved without recourse to high-level reasoning. The value of such evidence is seen in the use of the random-dot stereogram as a tool of psychophysics: before Julesz showed in 1962 that stereoscopic reconstruction can be achieved without any interpretation of the 2D images, it was a matter of debate whether stereoscopy required high-level knowledge.

In terms of 3D localization, although it is clear that tasks like “point to Paris” require the full reasoning machinery of humans, we now have evidence that this machinery is not required to maintain a 3D representation sufficient for navigation in the immediately present environment.

 
Timeliness  
 

This research manifesto is timely for two reasons. First, the recent engineering advances have stimulated work on artificial systems, in particular to address the open problems of scale. Biological research is also in an exciting state, not least because the use of virtual reality equipment means that research into 3D perception can now allow observers to explore their surroundings freely while, at the same time, keeping the visual stimulus under tight experiment control.

The chance of mutual benefit for the two fields is significant. Technological benefit Finally, several technological benefits might be expected to arise if support for this agenda means better artificial systems. Example applications include: augmented reality for computer-assisted surgery, virtual museum tours, mobile robots that can navigate using a camera only, vehicle tracking that does not depend on external infrastructure such as GPS.

Many of these problems have been open for some time, it is the combination of recent work in the two sciences that gives us hope that they are tractable in 20 years.

 
 
  We propose to visit the several UK groups involved in work related to this proposal, in order to ensure that there is support across the research community for this agenda. A partial (and provisional) list of people with whom we would like to consult includes:  
 
Sophie Deneuve, Daniel Wolpert, UCL  
 
Tom Collett, Mike Land, Sussex.
 
Mark Bradshaw, Surrey.
 
Julie Harris, Melissa Bateson, Newcastle.
 
Andrew Blake and Roberto Cipolla, Cambridge.
 
Bob Fisher, Edinburgh.
  In addition, our existing collaborations with Matthew Rushworth, Chris Miall, and several others at Oxford will continue to inform the agenda. As a result of this work, the team will make a 15 minute presentation to IAC that
will define the manifesto and represent the views of a cluster of UK researchers who are interested in pushing this agenda forward.
 
The following people have agreed to develop this proposal further:
 
Dr Andrew Fitzgibbon
University of Oxford
Dr Andrew Glennerster
University of Oxford
Prof Andrew Parker
University of Oxford

 

 

The IBM logo is a registered trademark of IBM corp and is used under license