The multiple-channel cochlear implant: the interface between sound and the central nervous system for hearing, speech, and language in deaf people—a personal perspective

Graeme M Clark

Abstract

The multiple-channel cochlear implant is the first sensori-neural prosthesis to effectively and safely bring electronic technology into a direct physiological relation with the central nervous system and human consciousness, and to give speech perception to severely-profoundly deaf people and spoken language to children.

Research showed that the place and temporal coding of sound frequencies could be partly replicated by multiple-channel stimulation of the auditory nerve. This required safety studies on how to prevent the effects to the cochlea of trauma, electrical stimuli, biomaterials and middle ear infection. The mechanical properties of an array and mode of stimulation for the place coding of speech frequencies were determined.

A fully implantable receiver–stimulator was developed, as well as the procedures for the clinical assessment of deaf people, and the surgical placement of the device. The perception of electrically coded sounds was determined, and a speech processing strategy discovered that enabled late-deafened adults to comprehend running speech. The brain processing systems for patterns of electrical stimuli reproducing speech were elucidated. The research was developed industrially, and improvements in speech processing made through presenting additional speech frequencies by place coding. Finally, the importance of the multiple-channel cochlear implant for early deafened children was established.

1. Overview

The multiple-channel cochlear implant restores useful hearing in severely-profoundly deaf people. It bypasses the malfunctioning inner ear (cochlea) and provides information to the auditory centres in the brain through electrical stimulation of the hearing (auditory) nerves (figure 1). It has enabled tens of thousands of severely-profoundly deaf people in over 70 countries to communicate in a hearing world. Deaf people have significant limitations communicating at home and in work environments, and this substantially affects their quality of life. Children born deaf or deafened early in life have difficulty developing spoken language. This affects their education, and as a result it is harder to obtain employment. The child's deafness may also create stresses within the family.

Figure 1

The structure of the ear and a diagram of the multiple-channel cochlear implant. The components are: A, microphone; B, behind the ear speech processor; C, body worn speech processor; D, transmitting aerial; E, receiver–stimulator; F, electrode array (Clark 2000).

Hearing occurs normally when sound vibrations are transmitted via the external and middle ears to the sense organ of hearing (organ of Corti) in the cochlea (figure 2). The organ of Corti lies on the basilar membrane, which vibrates selectively to different sound frequencies, so that it acts as a sound filter. High frequencies produce maximal vibrations at the basal end, and low frequencies at the apical end. Inner hair cells in the organ of Corti (figure 2) convert sound vibrations into electrical impulses that provide temporal and spatial patterns of auditory nerve excitation in the higher brain centres. In most cases, when a person has a severe-profound hearing loss they will have lost most of the hair cells in the inner ear. These people benefit from the multiple-channel cochlear implant.

Figure 2

The cochlea with 2½–2¾ turns spiralling around the modiolus (M). The fluid-filled canals in the turns are: SV, scala vestibuli; SM, scala media; ST, scala tympani. Inset: the organ of Corti (OC) rests on the basilar membrane (BM). The OC has outer (OHC) and inner (IHC) hair cells connected to auditory nerve fibres (AN). Reprinted with permission from Cochlear Corporation (1987).

The multiple-channel cochlear implant (figure 1) consists of an external microphone and speech processor, and an implanted receiver–stimulator and electrode array. The microphone is directional, and placed above the ear to select the sounds coming from in front, and this is especially important in conversation under noisy conditions. The voltage output from the microphone passes to a small speech processor worn behind the ear or a larger more versatile one attached to a belt. The speech processor filters the speech waveform into frequency bands, and the output voltage of each filter is then modified to lie within the narrow operating (dynamic) range required for electrically stimulating each electrode in the inner ear. A stream of data for the current level and electrode to represent the speech frequency bands at each instant in time, together with power to operate the device, are transmitted by radio waves via a circular aerial through the intact skin to the receiver–stimulator. The receiver–stimulator implanted in the mastoid bone decodes the signal and produces a pattern of electrical stimulus currents in a bundle of electrodes, inserted around the scala tympani of the basal turn of the cochlea. These electrodes excite the auditory nerve fibres and higher brain centres, where they are perceived as speech and environmental sounds.

A multiple rather than single-channel cochlear implant was developed because electrophysiological research demonstrated that single-channel stimulation could not reproduce essential high speech frequencies for speech understanding. It was then necessary to complete a series of animal studies to ensure that a multiple-channel electrode array, implanted into the cochlea, would not have adverse effects on the auditory nerve, before evaluating it on deaf people.

Research on the first two adults to have the University of Melbourne's multiple-channel cochlear implant, in 1978 and 1979, led to the discovery of a speech processing strategy that enabled them to understand conversational speech. With this speech processing strategy high frequencies of importance for intelligibility (second formants) were selected, and presented at a corresponding region in the cochlea by place coding. The amplitudes of the selected frequencies were converted to current levels. The lower speech frequencies for voicing were presented across each electrode as rate of stimulation. The speech perception benefits for this processing strategy were confirmed by standardized audiological tests (Clark et al. 1981b,c; Tong et al. 1981).

The prototype speech processor and implant were then developed industrially by Cochlear Pty. Limited and trialled internationally for the US Food and Drug Administration (FDA). In 1985, it was the first multiple-channel cochlear implant to be approved by the FDA as safe and effective for adults who had hearing before going deaf. It provided significant help in hearing speech using electrical stimulation in combination with lip-reading, and also with electrical stimulation alone.

Once it had been shown to be effective for adults who had hearing before going deaf it was implanted in three children in 1985 and 1986. They understood some running speech when the implant was used in combination with lip-reading, and a five-year-old had significant open-set speech identification scores for electrical stimulation alone. Once the benefits had been shown for children in Melbourne, it was implanted in children in other centres around the world. In 1990, it was the first implant to be approved by the FDA or any regulatory body as safe and effective for children from two to 18 years of age. The discoveries leading to the development of the University of Melbourne's multiple-channel implant are shown in figure 3.

Figure 3

An overview of the discoveries leading to the development of the multiple-channel cochlear implant.

Finally, with improvements in speech processing for the multiple-channel cochlear implant, the average speech perception results were equivalent to those for hearing aid users with an average threshold of 78 dB HL for the frequencies of 0.5, 1.0 and 2.0 kHz (Blamey et al. 1998; Sarant et al. 2001). Furthermore, an analysis of data by the UK Cochlear Implant Study Group (2004) showed that deaf people could benefit from an implant even if they had significant aided word-in-sentence scores in the worse or operated ear. The implant is more cost-effective than a hearing aid for a severe-profound hearing loss, and there are considerable educational cost benefits (Osberger et al. 1993; Evans et al. 1995; Summerfield & Marshall 1995; Summerfield et al. 1995; Wyatt et al. 1996). Furthermore, the costs to achieve a Quality Adjusted Life Year show it to be comparable to a coronary angioplasty, and an implantable defibrillator.

2. The coding of sound frequencies with electrical stimulation of the auditory nerve (1967–1975)

Research was firstly undertaken to answer the question: could electrical stimulation reproduce the frequencies of speech through temporal coding?

The temporal code for frequency is illustrated in figure 4a. Note that the cells from the cat auditory brainstem fire in phase with the sound waves at a low frequency, but not necessarily every cycle. The phase-locking of the neurons can be seen in a histogram of the inter-spike intervals (figure 4b). Furthermore with sound there is a stochastic distribution of intervals around the peaks, and this is primarily due to the probabilistic nature of the hair cell/neural interface (Siebert 1970; Burkitt & Clark 2001; Paolini et al. 2001). The phase locking decreases for higher frequencies, and ceases at 4.0–5.0 kHz (Rose et al. 1967).

Figure 4

(a) The phase locking of action potentials in a group of nerve fibres. (b) An inter-spike interval histogram of the neural responses to an acoustic stimulus of 0.416 kHz.

If the temporal coding of speech frequencies was important, and could be mimicked with electrical stimulation, only a simple single-channel implant might have been required. Single-channel implants have either transmitted the amplitude variations of the speech wave (Michelson 1971; Hochmair 1980; Danley & Fretz 1982) or the voicing frequency (Fourcin et al. 1979), to a single electrode.

Research (Clark 1969b,c) compared the responses of cells in the superior olive (brainstem) of the experimental animal to acoustic and electrical stimulation, as prior behavioural studies (Butler & Neff 1950; Butler et al. 1957; Goldberg & Neff 1961) in hearing animals had found that frequency discrimination occurred in brainstem nuclei. Furthermore, it was shown by Greenwood & Maruyama (1965), Paolini & Clark (1999), and others that there was an interplay between excitation and suppression (presumed inhibition) in the brainstem nuclei for processing auditory information, and electrical stimulation could alter the balance between excitation and inhibition. Thus, the physiological findings of Clark (1969b,c) from an auditory brainstem nucleus more closely reflected the perceptual effects of the electrical stimuli than recordings from the auditory nerve, where inhibition is not present. The specific intent of this research was to understand how to electrically stimulate the auditory central nervous system for speech perception.

In studying whether electrical stimulation could reproduce the temporal coding of speech frequencies, it was found that cells in the superior olive did not phase lock to stimulus rates above 200–500 pulses s−1 (Clark 1969b,c), which is much less than the 4.0 kHz required for speech perception. The time course of the suppression limited phase-locking, and as it was longer than the refractory period of the neurons, this suggested inhibitory mechanisms were also involved.

In addition, Merzenich et al. (1973) reported that sinusoidal electrical stimuli produced phase locking of responses in the inferior colliculus (another brainstem nucleus) for rates up to 400–600 pulses s−1. Snyder et al. (1995) and Shepherd et al. (1999) found 120–300 pulses s−1 to be the upper limit. These data were consistent with the conclusions of Clark (1969b,c).

In the study by Clark (1969b,c) it was also found that the cell responses were deterministic (i.e. tightly phase-locked to the stimulus). In generating a nerve action potential, although there is a probabilistic opening and closing of the membrane ion channels, there are many channels and this ensures the response is effectively deterministic. This was described for neural excitation by the deterministic differential equations of Hodgkin & Huxley (1952).

To establish that the findings on individual cells applied to a wider population, the field potentials for electrical stimulation were recorded, as they are due to summed cellular activity (Clark 1969a,c). There was a reduction in their amplitudes towards the base line for rates greater than 300 pulses s−1 (Clark 1969ac; 1970a,b), as illustrated in figure 5. This confirmed the findings from cells.

Figure 5

Field potentials from the superior olivary complex in the auditory brainstem of the cat for 1 and 300 pulses s−1 rates of simulation of the auditory nerve in the cochlea (Clark, 1969b).

To be sure that these discoveries from individual cells and field potentials were more applicable to the patient, a series of behavioural studies was undertaken on alert experimental animals from 1971 to 1975 (Clark et al. 1972a,b, 1973; Williams et al. 1976). The study by Clark et al. (1972a,b) showed the abilities of cats to discriminate stimulus rates of 100 pulses s−1 at the basal (high frequency) end or apical (low frequency) end of the cochlea were statistically the same. On the other hand, discrimination was significantly better at the apical end for a rate of 200 pulses s−1. The upper limit on the rate that could be discriminated was found to be 600–800 pulses s−1 (Clark et al. 1973; Williams et al. 1976). This was consistent with the cell and field potential findings. It was also an important discovery that low rates of excitation could be discriminated at the high frequency region of the cochlea, as this suggested that the brain could process information for rate separately from that of place of stimulation. In addition, the behavioural studies (Clark et al. 1973) showed that the detection of low rates of modulation of electrical stimuli was the same for the frequency glides of acoustical stimuli at the same low rates. But at high modulation rates detection was much poorer for electrical compared to acoustic stimuli (Clark et al. 1973). This indicated that place rather than rate of stimulation would be required to reproduce the rapid frequency glides or transitions that are important for the coding of consonants.

In addition to the studies by Clark (1969a,c) on the brainstem nuclei, Moxon (1971) examined the ability of auditory nerve fibres (not subject to inhibition) to follow rate of stimulation to elucidate the duration of their refractory state after excitation. It was demonstrated they responded initially at rates as high as 900 pulses s−1, but the rate fell to 500 pulses s−1. It was later found from the globular ‘bushy’ cells in the anteroventral cochlear nucleus (Paolini & Clark 1997; Paolini et al. 2001), using intracellular recordings to minimize electrical artefact, that phase locking occurred up to 1200 pulses s−1, after which there was only a Poisson distribution of inter-spike intervals (figure 6). The analysis was relevant, as the temporal responses of these cells are similar to those of the incoming auditory nerve fibres (Lavine 1971).

Figure 6

Intracellular inter-spike interval histograms from globular bushy cells in the anteroventral cochlear nucleus of the rat for increasingly higher rates of electrical stimulation (200, 800, 1200 and 1800  pulses s−1) (Paolini and Clark, 1997). Reprinted from Clark (2003) with kind permission of Springer Science and Business Media.

The above research indicated that single-channel stimulation, which would need to rely on temporal coding, would be inadequate for speech understanding.

In addition, the study by Clark (1969b,c) not only aimed to see whether electrical stimulation could reproduce the phase-locking seen with sound, but also the stochastic nature of the response (figure 4b; figure 6), even though it was not clear if stochastic firing was an epi-phenomenon or fundamental to coding. It was presumed that electrical sine waves would produce more stochastic firing than rectangular pulses due to their slower rise times, but no differences were observed, and this was confirmed by Hartmann et al. (1984).

Subsequent research has examined whether stochastic firing can benefit the temporal processing of signals. As discussed by Zeng et al. (2000), Burkitt & Clark (2001) and Hohn & Burkitt (2001), the addition of noise can improve signal detection in a nonlinear system (stochastic resonance). On the other hand, the addition of noise led to the detection of periodicity in the sub-threshold stimulus. This principle was used by (Hong & Rubinstein 2003) on cochlear implant patients, and four out of five subjects had an increased dynamic range. Thus stochastic resonance could be used to improve temporal coding of frequencies essentially at low thresholds.

The second question to be answered was: could electrical current be localized to separate groups of auditory nerve fibres in the cochlea to allow the important mid-high frequencies of speech to be coded as place or site of stimulation rather than through temporal coding?

With place coding, sound is filtered by the basilar membrane with the maximal displacement for high frequencies in the basal turn moving to the apical turn for low frequencies. Physiological research had shown that place coding is preserved throughout the auditory system via connections between the cochlea and each of the centres in the central auditory pathways, so that an orderly frequency scale is preserved (Rose et al. 1959; Kiang et al. 1965). The pitch perceived then depends on the site of excitation in the brain.

A model of the electrical resistances of tissues within the cochlea was developed to allow the current flowing through the compartments of the inner ear to be calculated (Black & Clark 1977, 1978, 1980), and so determine the best sites for electrodes to effectively localize current for the place coding of frequency. The resistances had been previously measured by von Békésy (1951) and Johnstone et al. (1966). The model demonstrated that a current passing through the organ of Corti (likely to excite peripheral auditory nerve fibres) would be better localized for monopolar stimulation (figure 7) between an electrode in the scala tympani 16 mm from the round window and a distant ground, compared to currents passing between the compartments. The current flow for bipolar stimulation (current passing between two neighbouring electrodes, as shown in figure 7), was not evaluated with the model.

Figure 7

(a) A diagram of the voltage field for bipolar stimulation; (b) ‘pseudo-bipolar’ (common ground) stimulation; (c) monopolar stimulation.

The validity of the model findings was then determined in the experimental animal by comparing current distributions in the terminal auditory nerve fibres for the different modes of stimulation using a methodology similar to that of Merzenich & Reid (1974). The mean length constants for current spread in the wide first part of the basal turn were 13 mm (monopolar stimulation), 2–4 mm (bipolar stimulation) and 7.5 mm (between the scala tympani and vestibuli) (Black & Clark 1980; Clark 2003). The length constant is the inverse of the natural logarithm of the voltage 1 mm from the recording site, divided by the voltage at the site, and the higher the value the greater the current spread. However, at a point 14 mm from the round window, where the cross sectional area of the cochlea is reduced, the length constant for monopolar stimulation in the scala tympani was 4 mm. The localization of the current with monopolar stimulation was later demonstrated in implantees to be effective (Busby et al. 1994). Initially to minimize overlap in electrical fields, the prototype receiver–stimulator was designed to provide ‘pseudo-bipolar’ (common ground) stimulation. With this stimulus mode, current passed from each active electrode in the scala tympani to 10 others in the same scala all connected together as the grounding path (figure 7). Studies in the first patient showed that current could be localized with pseudo-bipolar stimulation for the place coding of frequency (Black et al. 1981; Tong et al. 1982; Tong & Clark 1983).

3. The safe and effective intra-cochlear multiple-channel electrical stimulation of the auditory nerves (1972–1992)

The research described above suggested that the place coding of frequency would be best achieved with electrodes placed inside the scala tympani. However, initial studies on patients by some groups were primarily with electrodes outside the cochlea, as the cochlea was considered by some to be too delicate for surgical implantation (Legouix (1957) cited by Simmons (1966)).

Research commencing in 1972 (Clark et al. 1975b) showed that when multiple electrodes were inserted into the scala tympani of the cochlea, through a number of holes drilled in the overlying bone, there was marked damage of all structures, and associated loss of the auditory nerve fibres (Clark et al. 1975b). However, it was found (Clark et al. 1975b; Clark 1977) that a free-fitting electrode carrier could be passed through the round window and along the scala tympani with only mild histopathological changes in the cochleae. The study also emphasized that the cochlea needed to be protected from middle ear infection extending around the electrode, which could also lead to meningitis.

The studies to determine how to achieve an adequate insertion depth and the stiffness and extensibility of the materials required were undertaken on human temporal bones and moulds of the cochleae (Clark et al. 1975a). The electrode carriers were found to pass only 10 mm upwards into the tightening spiral of scala tympani of the basal turn, but as with the experimental animal, they passed downwards easily into the widening spiral of the cochlea. It was also demonstrated that an electrode bundle inserted upwards along the scala tympani would lie at the periphery of the spiral, and that its upward progress in the basal turn was impeded through frictional forces against the outer wall (Clark et al. 1975a). However, it was discovered in 1977 that an appropriate insertion depth could be achieved if the electrode bundle became increasingly stiff towards its proximal end, as well as having a flexible tip (Clark 2003). This was made possible by the incremental addition of electrode wires that progressively stiffened the array from the tip to the base. This electrode with increasing stiffness could be passed around the basal turn to lie opposite the speech frequency region, as shown in figure 8.

Figure 8

A banded, free-fitting, smooth, tapered, electrode array with graded stiffness that has passed into the scala tympani of the basal turn of the human cochlea. M, modiolus; RW, round window or entry point to the basal turn of the cochlea; BT, basal turn of the cochlea. Reprinted from Clark (2003) with permission of Springer Science and Business Media.

In the late 1970s and early 1980s, a series of studies was undertaken to ensure that the materials for the electrode array, as well as the receiver–stimulator package, were biocompatible and non-toxic before they were used in a clinical trial for the FDA (Clark et al. 1983; Clark 1987). The procedures were appropriate modifications of those outlined in the US Pharmacopeia (1980) and exceeded the recommendations. An initial report (Clark et al. 1983) and a more detailed study (Clark 1987) established that Silastic MDX-4-4210, Silastic medical adhesive type A, and platinum were biocompatible and produced little fibrous tissue reaction in the subcutaneous and muscle tissue in the rat and cat, and Silastic MDX-4-4210 and FEP (fluoroethylene propylene) produced little reaction in the cat cochlea. Candidate materials were further tested for the FDA for cytopathic effects against embryonic cells, for systemic toxicity by intravenous and intraperitoneal injection in mice, intracutaneous irritation in rabbits, and for tissue reactions to subcutaneous and intramuscular implantation after 90 days. The assembled units were evaluated by implanting them intramuscularly for four weeks and examining the tissue response, as the manufacturing process and working of materials could change their biocompatibility.

It was found that circumferential platinum band electrodes had the required smooth surface to facilitate the insertion of a free-fitting array with graded stiffness (Clark et al. 1979a) (figure 8). The bands lay flush with the surface of the array to minimize resistance when inserted. The wires passed centrally along the tube to emerge through openings where they were welded to the bands. The outside diameter of the array varied from 0.56 to 0.64 mm, to ensure that it would fit freely into the scala tympani, and the electrodes had a width of 0.3 mm and inter-electrode spacing of 0.45 mm. This enabled 20 electrodes to lie along the first 15 mm of the array where they would be opposite the speech frequencies if the whole array were inserted 20–25 mm. With the circumferential electrodes, the array was tolerant of lateral displacement or rotation as a result of the insertion.

To ensure the array did not lead to significant trauma, investigations were carried out on human temporal bones (Shepherd et al. 1985). A localized tear of the basilar membrane and fracture of the spiral lamina were seen in two of nine bones when the insertion had continued beyond the point of first resistance. This would have led to a restricted loss of neurons (Simmons 1967; Axelsson & Hallen 1973).

Furthermore, the research by Clark et al. (1987c) in the experimental animal indicated that the banded electrode array was not held tightly by a fibrous tissue sheath after long-term implantation, and could be easily removed and another one reinserted at a later stage if replacement was required. This meant that the future implantable receiver–stimulators did not require a connector, and this made them smaller and able to be implanted in young children.

Charge density, charge per phase, total charge, and direct current (DC) lead to neuronal damage through their effects on the ability of cellular metabolism to maintain homeostasis (McCreery & Agnew 1983). Electrical current can also produce an electrolytic reaction at the electrode–tissue interface with the release of toxic platinum ions (Agnew et al. 1977).

Safe current and charge densities for stimulating the auditory nerves were determined for the bands. Long-term stimulation in the experimental animal was undertaken with these banded arrays using current levels and charge densities at the top of the range required to produce maximum loudness in the first patients. The pulses were biphasic with a negative and positive phase. The charge per phase was balanced so there was no residual charge to produce a build up of damaging DC.

An initial (Shepherd et al. 1982) and more detailed study (Shepherd et al. 1983) showed that charge densities less than 32 μC cm−2 geom./phase and DC levels less than 0.3 μA had no adverse effects on neurons in the cochlea, and did not produce new bone growth when stimulation was carried out continuously for up to 2000 h. Charge density is measured either for the real reactive or just the geometric area. This became the upper allowable charge density for use in patients.

Scanning electron microscopy also demonstrated no corrosion of the banded electrodes taken from the experimental animals (Shepherd et al. 1984), although corrosion was seen for electrodes in saline with the same charge densities. This confirmed the protective effect of protein.

The response of the human cochlea and brainstem to implantation and electrical stimulation with the banded array was subsequently studied (Clark et al. 1988). This analysis enabled the response of the cochlea to be compared with that of the experimental animal. Stimulation for 10 000 h in the human did not lead to any observed effect on the auditory spiral ganglion cells in the cochlea or the higher brain centres.

The same stimulus parameters were also considered safe for children aged 2 years and above. But prior to implanting children under 2 years, studies were undertaken in the experimental animal to ensure the parameters had no adverse effect on the immature cochlea or central nervous system (Ni et al. 1992; Burton et al. 1996).

As high stimulus rates (800–2000 pulses s−1) were later shown to improve speech processing, their effect on neuronal survival was examined in the experimental animal. Long-term stimulation at rates up to 2000 pulses s−1 and DC levels below 0.3 μA produced no significant loss of auditory ganglion cells (Tykocinski et al. 1995; Xu et al. 1997). Thus, high rates of stimulation were safe if the above parameters were used.

Initially, it was shown in the experimental animal (Clark et al. 1975b; Clark 1977; Shepherd et al. 1983) that spontaneous middle ear infection could spread around the electrode at its entry to the cochlea, and result in severe infection with marked loss of the auditory neurons (spiral ganglion cells) with the risk of device-related meningitis. With these electrode insertions there had been no attempt to facilitate a seal. As a result, in 1977 further studies were commenced to determine how to seal the entry point (Clark et al. 1984).

Firstly, foreign material was glued as discs or sleeves around the electrode at the opening into the cochlea to encourage the ingrowth of fibrous tissue into the material, and so increase the path length for bacteria (Clark et al. 1984). The materials were tested in the presence of experimentally induced Staphylococcus aureus and Streptococcus pyogenes infections of the middle ear. To compare the findings, an effective system was developed for classifying cochlear infection. This was based on the severity of the acute inflammation, the degree of healing, and the extent of the spread within the cochlea (Clark et al. 1984).

It was found that a muscle autograft around the electrode or a Teflon felt disc, prevented a Staphylococcus aureus infection in the middle ear extending to the cochlea. In addition, a fascial graft around the array prevented Streptococcus pyogenes, a more invasive organism, spreading to the basal turn of the cochlea. On the other hand, Dacron mesh with an overlying fascial graft was associated with a strong inflammatory response. This facilitated the spread of infection to the basal turn of the cochlea and along the cochlear aqueduct towards the meninges. It was thus not recommended as a round window seal for patients.

Studies were undertaken to find out how the tissue around the electrode entry healed, to understand how to prevent the extension of infection to the cochlea. Research (Franz et al. 1984) on the penetration of horseradish peroxidase into tissues showed an increased permeability of the surrounding tissue and round window membrane over a period of approximately 2 weeks. Thereafter the round window membrane barrier returned to normal in a further two weeks. Healing could thus be a vulnerable stage for the spread of infection and the development of meningitis.

In the presence of middle ear infection, the round window membrane demonstrated a more marked proliferation of the connective tissue and the formation of protuberances of the mucous membrane, as well as mucous cell proliferation around the electrode. This was part of the body's first line of defence against bacteria, as mucus is bacteriostatic. In these round windows, although the permeability was increased, the penetration of horseradish peroxidase into the scala tympani was limited. Horseradish peroxidase always passed through the gap between the membrane and the prosthesis. However, particles were taken up by a connective tissue envelope that formed around the prosthesis after about one week. These data emphasized that, in the early healing phase, infection could extend to the inner ear, but they also demonstrated the importance of an early development of a connective tissue sheath around the electrode array. This sheath provided an effective barrier against the spread of infection (figure 9), by also allowing the second and third lines of defence to operate. With the formation of a sheath, capillaries brought the phagocytic white cells to the tissue surrounding the electrode and the space between the electrode and sheath, to engulf the bacteria (second line of defence). The same mechanism allowed lymphocytes to penetrate the tissue and space next to the electrode, and provide antibodies against the invading organisms (third line of defence) (Cranswick et al. 1987).

Figure 9

A photomicrograph of an implanted cat cochlea. M, infection in the middle ear with proliferation of the mucous membrane; R, thickened round window extending into a well-formed electrode sheath. There is no extension of the infection to the cochlea. Reprinted from Clark et al. (1990) with permission from Elsevier.

Additional evidence for the importance of a sheath around the electrode was obtained from the study by Cranswick et al. (1987). The entry point through the round window was not sealed with tissue, and it was found there were two types of histological response in the cochlea, one with loose connective tissue in the basal turn, and another with an extensive sheath around the electrode array. There was a trend for infection to extend into the cochlea when there was localized loose fibrous tissue rather than a complete sheath.

These studies thus stressed that infection could more easily spread to the cochlea in the postoperative period, and emphasized the need for strict aseptic measures before, during and after surgery.

Later, when it was discovered that cochlear implants should be best carried out on children under two years of age, research was undertaken to ensure that middle ear infections with Streptococcus pneumoniae, very common in this age group, could be prevented from extending to the inner ear and thus leading to meningitis. Would sealing the electrode entry point with a fascial graft also be effective against Streptococcus pneumoniae, which has a different pathogenicity from the other organisms tested?

As a preliminary study (Berkowitz et al. 1987) indicated that sealing to prevent the ingress of pneumococcal infection would be important, research was undertaken on 21 kittens to compare different sealing techniques following the induction of pneumococcal otitis media (Dahm et al. 1994). The results indicated that cochlear implantation did not increase the risk of labyrinthitis following pneumococcal otitis media, but there was a reduced incidence of infection when the entry point was grafted. Therefore, for safety, it is important to place a fascial graft around the electrode where it enters the cochlea.

The response of the human cochlea to implantation and a fascial graft was studied by Clark et al. (1988) and Dahm et al. (2000). There was a good seal around the electrode at the entry through the round window and a well developed fibrous sheath. Both would have been an effective barrier against the spread of infection from the middle ear.

The above experimental results only apply to a single-component free-fitting array, but not a two-component array. A space between two components is a conduit for infection, a home to allow pathogens to multiply, as well as a site to increase the pathogenicity of the organisms and reduce the ingress of antibodies and antibiotics (Clark 2003).

A detailed analysis of the growth of different parts of human temporal bones from birth to adulthood was made to determine the growth changes (Dahm et al. 1993). Key findings were that: (i) the distance between the sino-dural angle (the site for the implant) and the round window (near the site for the electrode insertion into the cochlea) increased on average by 12 mm from birth to adulthood with a standard deviation of 5 mm. Therefore, a paediatric cochlear implant should allow up to 25 mm of lead wire lengthening. This was consistent with the conclusions of O'Donoghue et al. (1986) from radiographic studies. In addition, as there was no increase in the distance between the round window and the fossa incudis (floor of the mastoid antrum) with age, this indicated that fixation of the lead wire to the fossa would be desirable, as any growth changes would be transmitted to this point rather than pulling the electrode from the round window (Dahm et al. 1993). Studies also showed that a redundant loop for the lead wires could lengthen even if surrounded with fibrous tissue, and furthermore it would not need a protective sleeve (Burton et al. 1994).

4. Development of the fully implantable receiver–stimulator

An implantable receiver–stimulator with power and data transmitted through the intact skin was developed for multiple-channel stimulation of the auditory nerve (figure 10), as infection occurred between a socket and the skin edges in experimental animals (Minas 1972). The physiological design specifications were provided from animal studies (Clark et al. 1977a).

Figure 10

The University of Melbourne's prototype receiver–stimulator ready for implantation in the second profoundly deaf patient in 1979.

There were a number of questions to be answered in engineering the receiver–stimulator. (i) Should information be transmitted through the skin to the implant by infrared light shining through the eardrum, ultrasonic vibrations to the scalp, or radio waves through the overlying tissues? Studies established that an electromagnetic link was the most efficient method of data transfer (Clark et al. 1977a; Forster 1978). (ii) Should the implant have its own batteries or receive power from outside? As batteries would increase the size of the implant and have to be replaced periodically, radio signals were the better alternative. (iii) Where should the implant be sited? It was considered best placed in the mastoid bone behind the ear, so there would be a short lead to the electrodes in the cochlea to minimize fractures from body movements. This placement also meant that a directional microphone could be more conveniently located above the ear to enable the person to receive a maximal speech signal while also lip-reading a person face-to-face. (iv) What would be the design of the receiver–stimulator electronics to allow suitable stimulus patterns to be tested? As the stimulus data had to be transmitted through the intact skin as coded pulses on a radio wave, there was less flexibility in the choice of test stimuli than with a percutaneous plug and socket. Physiological, biophysical, and biological data helped set the design limits. (v) How should the electronics be packaged? The body is a very corrosive environment and salts and enzymes could find their way through very fine cracks and cause electronic failure. This was especially likely where the wires leave the package to join the electrode bundle passing to the inner ear. A titanium/ceramic seal as had been pioneered by the pacemaker firm Telectronics would be needed. (vi) What were the optimal dimensions of the package? This was established on the basis of anatomical dissections. (vii) Did the implant package need a connector in case it failed and another one had to be re-attached to the electrodes in the inner ear, and how should it be designed? Initially, a connector with compression pads was used, but later biological studies showed that it was not needed as electrode arrays could be easily removed and reinserted along the preformed electrode track (Clark 2003).

The principles underlying the electronic design of the Melbourne receiver–stimulator were elaborated in a paper (Clark et al. 1977c) and a provisional patent filed in 1976 (Forster et al. 1976). The amplitude, rate and timing of biphasic pulses on each of 10–15 channels were independently controlled to stimulate an electrode array passed around the basal turn of the cochlea. It enabled frequencies to stimulate appropriate electrodes to provide 10–15 channels of stimulation for the place coding of frequency. The Melbourne hearing prosthesis was developed clinically and was very different to a design proposed in a patent by Kissiah (1977).

5. Pre-operative selection, asepsis, and surgery

In 1975, the preoperative selection procedures were developed and this included electrical stimulation of the auditory nerve with an electrode placed in the middle ear. From the quality of the sounds heard it was considered possible to determine whether the auditory nerve was intact (Clark et al. 1977c).

As the animal experimental studies had shown that spontaneous middle ear infection could extend around the cochlear implant electrode into the inner ear and lead to labyrinthitis and even meningitis, especially during healing in the first few weeks postoperatively, a protocol was developed to minimize the risk of introducing infection at the time of surgery (Clark et al. 1977b).

The development of surgical procedures and instruments for implanting the multiple-channel cochlear implant commenced in 1975, and were used on the first Melbourne and subsequent patients. This included the development of appropriate surgical instruments such as the micro-claw (Clark et al. 1979b).

The University of Melbourne's prototype receiver–stimulator and electrode array were implanted in a first adult who had hearing before going deaf in 1978.

6. The perception of frequency and intensity codes using electrical stimulation (1978–1979)

Rate of stimulation was perceived as a true pitch sensation for both apical and basal electrodes. Pitch at low rates of 50 pulses s−1 corresponded to the acoustic sensation, but at rates above approximately 200 pulses s−1 it rose dramatically to over 1.0 kHz, and the discrimination was poor (Clark et al. 1978; Tong et al. 1979a). This result was similar to that of Mladejovski et al. (1975). The poor ability to discriminate stimulus rates above 200 pulses s−1 was consistent with the acute physiological and behavioural animal experimental data of Clark (1969b,c), Clark et al. (1972a,b; 1973), and Williams et al. (1976), and confirmed the need for multiple-channel stimulation to convey the important mid-high speech frequencies. The perception of a low pitch when stimulating a high frequency site was also consistent with the animal data, and suggested that temporal and place pitch were processed separately.

The limitation of coding pitch with rate of stimulation was later seen with the identification of melodies (Pijl & Schwarz 1995; Pijl 1997) where some recognition was possible up to 800 pulses s−1. This was emphasized by McKay et al. (2000) and McDermott (2004) who showed that the speech processing strategies that provided voicing as rate of stimulation (F0F1/F2 and Multipeak) were the most effective in conveying melody.

It was discovered that the timbre of sound rather than pitch per se varied according to the site of stimulation (Clark et al. 1978; Tong et al. 1979a). Timbre relates to the quality of the sound, and is the auditory sensation that enables a listener to judge that two sounds similarly presented with the same loudness and pitch are dissimilar, e.g. different instruments or voices. The quality of the sound varied systematically from sharp at the high frequency end to dull at the low frequency region of the cochlea. When the sound quality on each of 10 electrodes was compared, the electrode with the duller sound was more apical, except for one lying outside the cochlea, which probably excited low frequency fibres around the auditory nerve. Even adjacent electrodes could be ranked correctly. Thus, there were at least 10 channels for the place coding of speech frequencies.

Furthermore, it was observed that although rate of stimulation was perceived as pitch and place of stimulation as timbre they both had pitch-like qualities when combined, and one could influence the other (Tong et al. 1979a). Thus, a lower rate of stimulation on an electrode with higher pitch was reported as the same as a higher rate on an electrode with a lower pitch. This interaction of rate and place of stimulation on the perception of pitch was confirmed by Blamey et al. (1996b). On the other hand, the differences in the quality of sound for rate and place of stimulation (pitch versus timbre) initially reported by Tong et al. (1979a) were further established (Tong et al. 1983a; McKay et al. 2000). The early results of Tong et al. (1983a) were explained by Blamey et al. (1996b) that there were separate perceptual qualities for rate and place of stimulation (e.g. pitch and timbre), as well as two pitch components (e.g. a tone complex with two components).

The operating range for electrical stimulation of the auditory nerve was found to be much smaller (5–10 dB) than the 30–40 dB range for speech or the 120 dB range for sound generally (Clark et al. 1978; Tong et al. 1979a). This difficulty was partly overcome, as the discriminable steps in intensity were more similar for electrical and acoustic stimulation (Clark 2003). There were thus a usable number of discriminable steps (7–45) over the narrower dynamic range for electrical stimulation.

7. Speech processing strategy for open-set, connected speech understanding in late-deafened adults (1978–1983)

Initially, Simmons et al. (1965) coded speech by separating the signal into frequency bands using fixed-filters, and the outputs stimulated different electrodes in the auditory nerve, but speech understanding was not achieved. It was shown in 1978 by Eddington (1980) that, with four fixed-filters and the outputs stimulating widely separated electrodes in the cochlea on a place coding basis, some vowels and consonants could be recognized. Later, it was reported that a fixed-filter strategy provided limited understanding of connected speech (Eddington 1983).

In 1978, a physiologically based coding strategy was developed (Clark 1987; Clark et al. 1987b). It had 10 filters as 10 stimulus channels are needed for effective communication systems (Hill et al. 1968). The filter bandwidths were similar to the frequency response characteristics of the cochlear basilar membrane. The strategy also introduced the time delays for each frequency to reach its site of maximum vibration along the basilar membrane, and it produced jitter in the stimuli to mirror the stochastic responses of brain cells to sound. This strategy only provided limited speech understanding because the electrical currents representing the sound energy in each frequency band were overlapping, and this produced unpredictable variations in loudness. This discovery led to the important principle for all speech processing strategies that only non-simultaneous stimulation should be used (Clark 1987). Separating the stimuli on each channel by a short interval in time avoided the interaction of the electrical fields on each electrode, but neural integration over time could still take place.

The limitations in coding speech using the physiological model meant an alternative more effective strategy was needed. The idea of coding speech information conveying only the greatest intelligibility was discussed by Clark (1969b, 1995) who stated that: ‘The final criterion of success will be whether the patient can hear, and understand speech. If pure tone reproduction is not perfect, meaningful speech may still be perceived if speech can be analysed into its important components, and these used for electrical stimulation.’ This idea was developed further when the first implanted patient not only referred to the sensation on each electrode as timbre, but also as a vowel. It was then discovered that the vowels corresponded with those a normal hearing person would perceive if approximately similar frequency regions in the inner ear were excited by speech single formants (Clark et al. 1978; Tong et al. 1979a,b; Tong & Clark 1980).

Formant frequencies are the result of resonances in the vocal tract that amplify particular speech frequencies. It was also noted that the vowel heard could be changed by increasing or shortening the duration of the stimulus, as seen for vowels presented to normal hearing subjects (Clark et al. 1978; Tong & Clark 1980). If pairs of electrodes were stimulated together at a constant stimulus rate, the vowel perceived was different from that of a single-electrode stimulus, but depended on the relative amplitude of the two, suggesting the central nervous system was using an averaging process for the two neural populations excited. In addition, formants are important for the recognition of consonants, as illustrated in figure 11 for voiced plosives.

Figure 11

The first (F1) and second (F2) formant frequencies for the plosives /b/, /d/, /g/, and the burst of noise produced when the sound is released after the vocal tract has been closed. VOT, voice onset time, one of the cues for voicing.

The above findings (Clark et al. 1978; Tong et al. 1979a; Tong & Clark 1980) were the clue to developing a formant extracting speech processing strategy for electrical stimulation. The F2 frequency (700–2300 Hz range) was coded as place of stimulation, as rate of stimulation could not be discriminated at these higher frequencies. On the other hand, the fundamental (F0) or voicing frequency, which is also very important for speech understanding, for example in distinguishing between /b/ which is voiced and /p/ which is unvoiced, was coded as rate of simulation proportional to F0. This was appropriate as voicing is low in frequency (120–225 Hz), and as discovered, electrical stimuli could not be discriminated at higher rates (Clark et al. 1978; Tong et al. 1979b, 1980; Tong & Clark 1980). With unvoiced sounds a randomized pulse rate was used, this stimulus being identified by the patient as noise-like. Rate of stimulation was retained across each of the electrodes stimulated with F2, and this was consistent with the experimental animal behavioural studies and later psychophysical findings (Clark et al. 1972b, 1973, 1978) which indicated that temporal and place information were processed separately. The implementation of this strategy in a hard-wired speech processor is outlined in the patent lodged in 1979 (Tong et al. 1979b).

This F0/F2 strategy was first evaluated in 1978 as a software-based system on the first Melbourne patient who had hearing before going deaf. It was found that for a closed-set of six vowels the score was 77%, and for eight consonants 37% (Tong et al. 1980; Clark et al. 1981d; Clark & Tong 1982). The poorer consonant score was consistent with the fact that acoustic features other than formants are required for their recognition, and needed to be transmitted. Most importantly, however, it was discovered in 1978–1979 that the subject had a marked improvement in understanding running speech when it was used in combination with lip-reading, compared with lip-reading alone (386% improvement), and he was also able to understand some running speech using electrical stimulation alone (14%) (Clark et al. 1981b). This was demonstrated using open-sets of word and word-in-sentence tests (Clark et al. 1981b,c). With open-sets of words and words-in-sentences the material must be age-appropriate, in everyday use, and not practised. The results more closely predict the ability of a person to communicate in everyday situations than closed-sets of speech material. Similar improvements in speech perception were found for a second late-deafened adult operated in 1979 (Clark et al. 1981b,c). He like later people retained his memory for speech sounds over a period of many years, and this helped retraining in the use of the electrically coded speech information (Clark et al. 1981b,c).

It was found too that for closed-set word tests the mean scores for the F0/F2 strategy were significantly better than those for a single-channel strategy (Clark et al. 1981a). Single-channel strategies showed small improvements in speech perception when electrical stimulation was used as a lip-reading aid, but no speech understanding for electrical stimulation alone.

The above sets of results (Tong & Clark 1980; Tong et al. 1980; Clark et al. 1981b,c), obtained under standardized conditions, were the first clear and important demonstration that the multiple-channel cochlear implant could significantly help profoundly deaf people to understand running speech and communicate with people with hearing. These data were subsequently upheld when the speech processing strategy and implant were evaluated on patients in clinics internationally in a trial for the US FDA.

As the research showed that the formant-based strategy was effective for understanding everyday speech on the first two patients using Australian English, this suggested it should apply to every other language as they are all formant-based. This was confirmed on people from different language backgrounds in Melbourne, including those using tonal languages such as Mandarin or Cantonese (Xu et al. 1987), where variations in pitch as well as formants provide meaning.

The F0/F2 strategy was implemented by industry (Cochlear Pty Limited) as a smaller speech processor. The engineering improved the sampling time of the signal. It was trialled at nine centres in the US, Germany and Australia for the FDA. This established that the original data from the University of Melbourne gave a conservative assessment of the performance of this strategy (Dowell et al. 1986). Three months post-implantation, 40 patients had obtained a mean word-in-sentence score of 87% (range 45–100%) for lip-reading plus electrical stimulation, compared to a score of 52% (range 15–85%) for lip reading alone. It was also found that marked improvements occurred in the first post-operative year. In a subgroup of 23 patients the mean open-set word-in-sentence scores for electrical stimulation alone rose from 16% (range 0–58%) at three months post-implantation to 40% (range 0–86%) at 12 months (Dowell et al. 1986). This indicated that a number of people were reaching a level of speech perception that allowed effective use of the device without lip-reading.

When the clinical trial was completed in 1985, and the F0/F2 strategy for multiple-channel stimulation was approved by the FDA, this approval established it as the first implant to restore speech understanding for adults who had hearing before going deaf, and to do so safely.

8. The perception of complex electrical stimuli of importance for speech understanding (1980–1985)

Research was undertaken to discover why the cochlear implant F0/F2 strategy was effective, as this would lead to new knowledge of how the brain processed speech information and other complex sounds, and also lead to improvements in speech processing.

Research on the perception of complex patterns of electrical stimulation showed that a varying stimulus rate could only be recognized over the longer duration required to convey prosody (the stress, tempo, rhythm, and intonation of speech) rather than over the shorter durations of consonants, as illustrated in figure 12. This supported the rationale of using rate of stimulation for differentiating between, for example questions and statements. It was also supported by the discovery that rate of stimulation was not only perceived as pitch, but in a speech context it was recognized as voicing (Tong et al. 1982). On the other hand, a change in site of stimulation could be recognized over the shorter durations of consonants (figure 12), and was thus appropriate for transmitting the frequency shifts seen in the consonants characterized as plosives (/b/, /d/, /g/, /p/, /t/, /k/) and nasals (/m/, /n/, /ŋ/).

Figure 12

Rate and place discrimination versus duration. The percentage judgments called different for shifts in rate and electrode site when compared with a standard stimulus are shown versus pulse rate and stimulus place trajectories. The trajectories were 25, 50, and 100 ms in duration. (a) The initial pulse rates of the trajectory were varied from 240, 210, 180, to the baseline, 150 pulses s−1. (b) The initial electrodes of the trajectory varied from electrodes 4, 3, and 2 to the baseline 1. Reprinted from Tong et al. (1982) with permission.

The research also demonstrated that questions and statements were still recognized when variations in rate of stimulation were super-imposed across stimulus electrodes that transmitted shifts in formant frequencies. In addition, it was established that the hearing sensations produced by steady-state stimuli differing in electrode position and repetition rate were characterized by two pitch components for each of these two parameters (Tong et al. 1983a; McKay et al. 2000).

These studies thus confirmed that the central auditory nervous system processed timing and spatial excitation over separate systems, with different time courses. They also demonstrated that information presented at the one time to both sets of processing systems could be integrated into a single perceived speech sound, but with individual pitch components (Tong et al. 1983a; Blamey et al. 1995). This processing capacity enabled voicing to be transmitted as a temporal code separately from speech formants or spectral frequencies. It thus suggested that the F0/F2 speech processing strategy was coding information by the spatial and temporal pitch systems in the brain.

The recognition of vowels in normal hearing listeners depends in particular on the perception of the first (F1) and second (F2) formant frequencies, and thus spatial excitation of two sites in the central nervous system. Research showed that if two electrodes were stimulated non-simultaneously at two separate sites along the cochlea, a sensation was experienced with two components (Tong & Clark 1983; Tong et al. 1983b), equivalent to the F1 and F2 frequencies in vowels. This supported the development of a speech processing strategy that presented the F1 as well as F2 frequencies as site of stimulation while still coding the voicing (F0) as rate of stimulation. Although two pitch components could be distinguished, the stimulus nevertheless had only the one speech like quality, as discussed above (Tong et al. 1979a; Blamey et al. 1996b).

Loudness is related to the intensity of the stimulus (Moore 2003), and this in turn to the amount of electrical charge passing through the nerve membrane, and the population of neurons excited. First, Tong et al. (1983a) demonstrated the importance of total charge in coding intensity (Clark 2003). Loudness was also related to the number of neurons excited, and was summed as the spatial separation between electrodes increased up to 3 mm when it reached a plateau (Tong & Clark 1986). McKay et al. (1995) found that loudness summation was not affected by electrode separations greater than 0.75 mm. Both studies were consistent with the critical band concept for sound, where loudness is summed over a 0.89–1.0 mm segment of the cochlea (Clark 2003).

Before proceeding to evaluate a strategy that coded F1 as well as F2 frequencies through multiple-channel stimulation, it was modelled using sounds to simulate electrical stimulation in normal hearing subjects. The model used seven bands of pseudo-random white noise with centre frequencies corresponding to the electrode sites to represent different speech frequencies. The acoustic model results for basic perceptual tasks were similar to those for electrical stimuli.

The research next compared the recognition of voicing for both the acoustic model and electrical stimulation to determine whether electrical stimulation provided fine as well as coarse temporal frequency information. By using bands of noise the sine waves for each frequency were removed, and voicing was simply the amplitude modulation of the noise burst at the voicing frequency. The results for voicing with the acoustic model were comparable to those obtained for voicing with electrical stimulation on the first Melbourne implant patients (Blamey et al. 1984). The data thus suggested that for temporal pitch, electrical stimulation was only transmitting coarse temporal information, but not the fine temporo-spatial responses induced with sine waves for good quality sound (figure 13). With coarse temporal information, phase differences between the responses in individual neighbouring neurons are minimal, whereas they occur for fine temporo-spatial information (Clark 1996, 2003). In addition, the acoustic model proved to be good at reproducing speech results obtained using the F0/F2 speech processing strategy on implanted patients. The acoustic model also predicted that there would be improved speech perception with a strategy that presented F1 as well as F2 frequencies with F0 as rate of stimulation across the stimulated electrodes (F0/F1/F2).

Figure 13

A diagram of the processing of frequency information and the perception of pitch through central auditory spatial, coarse temporal and fine temporo-spatial perceptual systems.

9. Improved speech processing strategies (1981–1999)

In implementing a strategy that presented F1 as well as the F2 frequencies (F0/F1/F2), the F1 and F2 frequencies needed to be presented non-simultaneously along spatial processing channels with F0 as coarse temporal information across channels.

The mean open-set CID word-in-sentence scores using electrical stimulation alone rose from 16% (n=13) (F0/F2) to 35% (n=9) (F0/F1/F2) (Dowell et al. 1987) (figure 14). As the speech perception results and the transmission of information with vowels and consonants for electrical stimulation using the F0/F1/F2 processor were similar to those obtained for the acoustic model of this strategy, the similarity confirmed the predictive value of the acoustic model. The F0/F1/F2 speech processor was approved by the US FDA in May 1986 for providing speech understanding in post-linguistically deaf adults who previously had hearing.

Figure 14

Open-set word and sentence scores for the F0/F1/F2, Multipeak, SPEAK and ACE strategies. All frequencies were coded on a place basis.

Because consonants with basic acoustic cues carrying high frequency information, for example fricatives such as /s/ and /∫/ (Clark 1987), had poor scores with the F0/F1/F2 strategy, and because of their importance for speech understanding, a strategy was developed which presented the energy from high frequency bands in the third formant frequency region, as well as the F1 and F2 formants, as site of stimulation. With this strategy (‘Multipeak’) the energy from the fixed filters with bandwidths of 2.0–2.8, 2.8–4.0, and greater than 4.0 kHz, was extracted as well as the F1 and F2 peaks.

When the F0/F1/F2 and Multipeak strategies were compared (Dowell et al. 1990) there was a small improvement in vowel scores with the Multipeak, and for consonants a larger increase. For open-sets of words-in-sentences there was a significant improvement for the Multipeak compared with the F0/F1/F2 strategy from 38% (n=32) (F0/F1/F2) to 59.1% (n=27) (Multipeak) (Hollow et al. 1995). Furthermore, a randomized prospective study by Cohen et al. (1993) showed there were greatly improved speech perception scores for the Multipeak strategy than for a six fixed-filter analogue scheme developed commercially as ‘Ineraid’ (Eddington 1983). The Multipeak system was approved by the FDA on 11 October, 1989 for speech understanding in deaf adults who previously had hearing.

A further improvement in speech processing was seen with an alternative strategy that selected the six maximal outputs from a bank of 16 filters (McDermott et al. 1992), and presented the information to the brain on a place-coding basis. The outputs of these filters sampled the distribution of energy across the speech frequency spectrum, but selected more energy for stimulating the electrodes where it is concentrated in the F1 region, as illustrated in the speech spectrogram and electrodogram in figure 15. It thus provided a better spatial resolution of the spectrogram than Multipeak. Furthermore, the outputs of each filter were presented to the electrodes at a constant rate, as the study by Tong et al. (1990) had shown that with a constant stimulus rate the amplitude variations in speech could transmit the coarse temporal information required for voicing. The difficulty of extracting the voicing frequency in noise for determining the rate of stimulation could be avoided.

Figure 15

(a) Spectrogram for the word ‘choice’ with the intensity at each frequency indicated by brightness. (b) The electrode representations (electrodogram) for ‘choice’ using the F1, F2, and high spectral frequency strategy (Multipeak). (c) The electrodogram for the SPEAK strategy.

This maximal filter output strategy gave better open-set word-in-sentence results in quiet and in noise, compared with the Multipeak strategy, and in quiet results went from 67% (Multipeak) to 76% (SPEAK) (Skinner et al. 1994). This strategy developed by Cochlear Limited had 20 filters and stimulated at a constant rate of approximately 180–300 pulses s−1 per channel (SPEAK).

An alternative to the above strategies was one with 4–6 fixed-filters (Eddington et al. 1978), that evolved into a strategy with interleaved pulses (IP) to avoid channel interaction, and was then modified to provide high stimulus rates of 833–1111 pulses s−1 per channel for better sampling of voicing, and referred to as the Continuous Interleaved Sampler strategy (CIS) (Wilson et al. 1992). The CIS six fixed filter strategy (833 pulses s−1) (Wilson et al. 1992) results (Kessler et al. 1995; Schindler et al. 1995) were comparable to those of SPEAK (Clark 2003). If it is assumed that the higher stimulus rate of CIS helped transmit more information, then the data showed the importance of selecting the maximal outputs of a bank of 20 filters rather than six fixed filters.

In view of the possible advantages of using a high stimulus rate for sampling voicing and also reproducing noisy signals, the SPEAK strategy was implemented at high rates of 800–1600 pulses s−1, and referred to as ACE. But before using high rates of stimulation, a series of studies on the experimental animal was undertaken to ensure that these rates would be safe (Tykocinski et al. 1995; Xu et al. 1997). An improvement in speech perception was seen for some patients when high rates (i.e. 800–1600 pulses s−1) rather than 250 pulses s−1 were used. The patients also described the signals as sounding more natural (Vandali et al. 1995).

Furthermore, research was undertaken to develop speech processing for bilateral implants and bimodal hearing (an implant in one ear and hearing aid in the other) to reproduce the benefits of two ears. These benefits are: (i) the ability to localize the direction of a sound (due to differences in the intensity as well as the time of arrival and phase of the sound at each ear); (ii) hearing speech on one side with competing noise in the opposite ear (the head shadow effect), (iii) improved understanding of speech in noise due to central neural mechanisms that compare the signals at each ear and partially remove the noise, but not the signal (‘squelch’ effect or binaural release from masking), (iv) improved audibility of soft sounds due to loudness summation.

Psychophysical studies on bilateral implant subjects, stimulating electrodes in each ear, demonstrated that the threshold for the perception of interaural intensity differences was comparable to sound and less than 0.17 dB. For the detection of temporal differences, although there was considerable variability in results, sensitivity was in the order of 100–150 μs which is less than the 20 μs seen with normal hearing listeners (van Hoesel et al. 1990; van Hoesel et al. 1993; van Hoesel & Clark 1995, 1997; van Hoesel & Tyler 2003).

When data from a study on four subjects were analysed there was a very significant head shadow effect of 4–5 dB. A squelch effect was marginally significant at 1–2 dB (van Hoesel & Tyler 2003). Other research (Muller et al. 2002; Schleich et al. 2004) has confirmed the importance of the head shadow effect, and found a similar squelch effect to that of van Hoesel & Tyler (2003).

With bimodal stimulation it was shown that a number of patients could fuse acoustic stimulation in one ear with electrical stimuli in the other ear, with an overall improvement in speech understanding, both in quiet and noise (Dooley et al. 1993). To develop a bimodal speech processing strategy similar frequency regions in each auditory nerve and central auditory pathway had to be excited, as was the case for bilateral implants (Blamey et al. 1996b; van Hoesel et al. 1993). By comparing the pitches for electrical and acoustic stimulation it was found that electrical stimulation excited nerves transmitting lower frequencies than expected for their location in the cochlea. This was presumed due to the passage of fibres from a more apical region beneath the stimulated area on their way to the auditory brainstem (Blamey et al. 1995, 1996b). Loudness summation occurred as also seen with bilateral implants. There were significant benefits also reported by Ching et al. (2004) for sound localization and hearing speech in the presence of spatially separated noise, as seen with bilateral implants.

10. Speech perception in early deafened children (1985–2003)

With children the main question to be answered was that if they were born deaf or deafened early in life could they process speech using strategies suitable for deaf adults whose brain connectivity would have been optimized by prior exposure to sound?

Initially studies on two early deafened adults who used sign language of the deaf, a system of signs that has its own grammar and is not related to spoken English, found they had poor speech understanding with electrical stimulation. This was associated with an inability to discriminate rate and site of stimulation. They did, however, have comparable abilities for discriminating current level to those of adults who had hearing before going deaf (Tong et al. 1988).

In 1985 and 1986 three progressively younger children (14, 10 and 5 years of age) had a multiple-channel cochlear implant (Clark et al. 1987a,b) and obtained some speech understanding. After developing a speech and language training regime it was reported (Dawson et al. 1989, 1992; Dowell et al. 1991) that five children from the University of Melbourne's clinic (aged 6–14 years) out of a group of nine had substantial open-set speech recognition for monosyllabic words scored as phonemes (range 30–72%), and sentences scored as key words (range 26–74%).

These were the first reports that severely and profoundly deaf children, born deaf or deafened early in life, could achieve open-set speech understanding and then spoken language, using electrical stimulation alone (Clark et al. 1987b; Dawson et al. 1989, 1991). They used the F0/F1/F2 and Multipeak speech processing strategies.

The results from the Melbourne clinic (Dawson et al. 1989, 1992; Dowell et al. 1991) led to the start of a clinical trial in the US for the FDA, that commenced in 1987 (Staller et al. 1991). The results for 80 children from the international trial were presented to the FDA, and the implant was approved as safe and effective for children from two years and above on 27th June 1990, making it the first cochlear implant to be approved by any international health regulatory body. Thus it became the first major advance in helping severely profoundly deaf children communicate since sign language was developed at the Paris Deaf School more than 200 years ago. See the supplementary online material for examples of spoken language being achieved by deaf children diagnosed and implanted at a young age, being trained to develop listening skills and language, and educated with an emphasis on auditory–verbal communication.

Psychophysical research was also undertaken to help determine whether there was a critical period for developing the place and temporal coding of sound (Blamey et al. 1992, 1996a), and whether their development correlated with speech perception results.

It was discovered by Busby & Clark (2000) on 16 early deafened children that the discrimination of electrode place was worse if there was a long period of hearing loss, or the child was implanted at an older age. The data suggested that exposure to sound or electrical stimulation during the ‘plastic’ phase of brain development would be required for this perceptual skill to develop. In addition, the better the discrimination of electrode place in the apical turn, the better the closed-set speech perception score. This indicated that discrimination of electrode place was important, but might not be the only factor for good speech perception.

The ability of children to rank electrodes according to whether pitch changed monotonically with the electrode place rather than simply discriminate electrode place, as discussed above, was compared with their speech perception scores, as shown in figure 16. 75% of the 16 children could rank electrodes monotonically. However, only 58% of these children with good ability to rank electrodes had satisfactory speech perception of 30% or more. This suggested that the effect of ‘developmental plasticity’ on the neural connectivity required for place discrimination was not the only factor for learning speech. Other factors could have been temporal processing or the development of language (Busby & Clark 2000; Clark 2002).

Figure 16

Electrode ranking according to whether the pitch changed monotonically with the electrode versus word scores for the Bench–Kowal–Bamford open-set sentences for electrical stimulation alone on 16 children using cochlear implants (Busby and Clark, 2000; Clark, 2002). Reprinted from Clark (2002) with permission.

11. Conclusion

Not only has the multiple-channel cochlear implant enabled severely profoundly deaf people to understand speech, but in children educated with the auditory–verbal and auditory–oral methods of education, 40% have obtained normal spoken language (M. Dann 2005, personal communication). Further advances in speech perception, especially in noise, as well as music quality, are likely with the better reproduction of the temporo-spatial coding of frequency through an advanced interface with the nervous system.

Acknowledgements

I would especially like to thank my research students and staff for their commitment, benefactors for financial support, and Cochlear Limited for the industrial development. I wish to thank especially Dr David Lawrence for all his help to prepare this manuscript. The research for all studies was undertaken under the supervision of the Royal Victorian Eye and Ear Hospital's Animal and Human Research and Ethics Committees.

Footnotes

  • Received December 3, 2004.
  • Accepted August 30, 2005.

References

View Abstract