 |


       
|
 |
 |
 |
 |
|
 |
 |
 |
 |
| |
The
scientific study of speech and language is being transformed
by new techniques for imaging the activities of the
intact human brain, and by inputs from neurobiological
sources – in particular neurophysiological and
neuroanatomical research into auditory processing
structures and pathways in non-human primates.
These
developments are likely to lead to major breakthroughs
over the next two decades in our scientific understanding
of these crucial human capacities, and offer the potential
for a detailed neuroscientific analysis of a major
natural cognitive system. A key enabling component
of these potential breakthroughs will be the ability
to build detailed computational models of speech and
language processing systems.
At
the same time, over two decades of quite independent
development, research into Automatic Speech Recognition
(ASR), primarily within the Hidden Markov Model (HMM)
framework, has led to the availability of commercially
successful speech recognition systems able to work
effectively in limited domains. These advances are
built on powerful techniques for statistical modelling
of the language system and of the complex acoustic
information that distinguishes among the core set
of speech sounds (phones) necessary for word identification.
But despite these achievements, current ASR technology
is still fragile in noisy environments, cannot easily
adapt to new dialects and still struggles to handle
continuous spontaneous speech, where language is used
in its natural communicative context. These continuing
difficulties suggest it may be timely to revisit neuro-biologically
based solutions to robust real-time speech comprehension.
|
|
|
 |
 |
|
|
|
 |
 |
 |
|
|
We
propose to re-examine the relationship between research
into human speech and language and research in ASR
– in particular in the domain of Continuous
Speech Recognition (CSR). Our overall goal (open question,
grand challenge) is to construct a neuro-biologically
realistic, computationally specific account of the
human speech and language system. This will not only
address fundamental scientific issues in the science
of cognitive systems, but will also help to inform
the development of future ASR systems.
There
are several important areas that will need to be explored
in the achievement of this goal, and where fruitful
two-way traffic between the two communities can be
expected. There are, for example, critical issues
in the comparative functional architecture of the
natural and artificial systems. How, for example,
does the human system organise itself, as a neuro-biological
system, to integrate top-down and bottom-up information
as it synthesises a successful analysis of the speech
stream? Is higher-level feedback used to optimise
feature coding of the acoustic waveform, and does
this have implications for machine recognition? A
related issue is whether the strictly phone-based
organisation of current ASR systems, where the primary
goal is to map from acoustic information to a small
set of phones, has any direct correspondence in the
human system.
Another
set of issues concern the relationship between HMM-based
processes and neuro-biological processes. To what
extent, for example, are the characteristics of human
processors determined by the statistical properties
of the speech input? If, as seems likely, learning
from statistical regularity is important to the human
system, then are these regularities being abstracted
and represented in ways that are comparable to current
ASR/CSR systems? If they are not comparable, then
what sort of statistical models would be needed to
model the human system and how can they be implemented?
These
are just a few examples of possible points of contact
between the two fields. Our goal over the next three
months, through a series of joint workshops and discussions,
is to refine and expand this list, and to develop
a more articulated definition of a Foresight Grand
Challenge in the domain of speech and language. An
initial workshop is already being scheduled.
|
|
|
 |
 |
|
 |
|
 |
 |
 |
 |
| The
following people have agreed to develop this proposal
further: |
|
| |
Professor
William Marslen-Wilson |
| |
|
MRC
Cognition and Brain Sciences Unit, Cambridge |
|
Professor
Steve Young |
|
|
Cambridge
University Engineering Department |
|
Dr
Roy Patterson |
|
|
Center
for the Neural Basis of Hearing, Department of Physiology,
University of Cambridge. |
|
Professor
Lorraine Tyler |
| |
|
Centre
for Speech and Language, Department of Experimental
Psychology, University of Cambridge |
|
 |
 |

|
The
IBM logo is a registered trademark of IBM corp and is used under license
|
|