The theory seems to makes sense of 'seeing ghosts', and we can I think readily grasp a failure of complete mourning as the reason for the 'negative perception' of the deceased beloved. It ties the visual cases nicely to the haptic or vestibular cases (that odd jolt on embarking on the static escalator) - our lived body expected movement but met with none, giving rise to a kind of reverse experience of movement because of its own now otiose compensation.
But what about 'voices' (AVHs)? Can 'spirit beings' that one 'hears' be naturalistically understood to arise in the same manner as 'spirit beings' that one 'sees'? The theory will have to be something like: the patient is in some or other latent manner primed to receive auditory stimulation, yet receives none whilst that priming is yet uncancelled, and so they 'hear' a 'negative' of what they latently expected. But why would they be primed to receive something, not receive it, not have this non-reception cancel the priming, and so then inverse-hear what they were primed for, and what on earth is inverse-hearing?
Well, consider the commonest AVH: hearing your own name being called. Aren't we all maximally subconsciously primed for encountering our own name being called? This call on our being, this fundamental human address - isn't it inscribed in the latent body of our subjectivity? (NB I'm not talking here just of our disposition to mis-hear ('mis-interpret' as we incautiously say) other sounds as our own name, or as our infant's cry, though presumably that too is a function of the same priming. I'm simply talking about the priming.)
We can apply to hearing the general sensory formula: personal-level hearing equals subpersonal-level expectation minus subpersonal-level stimulation. One of the most significant sources of expectation will be sensory changes due to bodily movements and sounds. (Remember that at the subpersonal level we don't do well to talk of 'self-generated movements and sounds'. Selfhood and perception are co-constituted equiprimordial personal-level phenomena. Cognitive models typically disrespect these distinctions and presuppose rather than derive selfhood in their accounts. Not that selfhood is to be explicated experientially - precisely the opposite in fact. Transcendental selfhood is to be explicated precisely in terms of that which renders experience possible yet is precisely itself not experienced. It is the co-constituted 'from where', not the 'object', of experience.) The other source will be due to past sensory stimulations in the environment in question. (These two come together when the bodily movements in question are the efferent stimulations by the brain of the outer hair cells in the cochlear.)
Having no sensory stimulation (e.g. when dropping off to sleep) will, then, lead to an experience of hallucinating your own name being called - if the underlying readiness for receiving the sensory stimulation met with on someone calling one's name is not cancelled by afferent sensory stimulation. Experiencing silence is an achievement, if you like: it's the result of the successful cancelling of sensory anticipations by sensory input.
So, mightn't it be something like this (excuse the anthropomorphism; it's a metaphor, ok, I'm not setting out to commit the mereological fallacy!): one part of the brain is all excited, thinking 'ooh, maybe I'll hear my name called!', then the null sensory input comes in and tells that part 'calm down dude, nobody's interested in you'. The result is silence. (The deaf person does not live in silence.) And what makes it possible for me to hear meaningful speech is that such parts of the brain are all excited and kinda expecting them. Only if a spotlight is dynamically swooping back and forth across the courtyard can the absence of intruders be registered; a stable stasis presupposes an underlying medium in a dynamic equilibrium. These are the shaped holes in the mind all prepared for the distinctively shaped sensory inputs. When the null input is received, the anticipatory firings get cancelled. But when the null input is not received, then the anticipatory firings result in what I'll call an anti-sound. Such anti-sounds, I suggest, are AVHs.
We often talk to ourselves in foro interno. We say something and perhaps even respond to it. There are those who, in my view convincingly, think that much of what we mean by the act of 'thinking' (if not by the logical category of 'thought') is to be understood thus. This involves activation of parts of the cortex also involved in speech production. Thereby there is also, I imagine (yes, this is all 'armchair neuroscience' - or better, it is preliminary reflection on the form best taken for interpreting the deliverances of actual neuroscience), a readiness generated for the sensory stimulation arising from actual vocalisation; the readiness may extend all the way out to the cochlear, or may remain within the cortex. Yet there is, in the normal case, also a cancelling of the neural activation subtending such readiness - through the null sensory input, or through an absence of the typical feedforward from the motor cortex, etc.
World-disengagement (i.e. schizophrenic autism) is particularly important for voice-hallucinating. Voice 'hearers' do so far more in silence - whether we have to do with hypnopompic hallucinations, situations of sensory deprivation, or with schizophrenic voices. In world-engagement we have in place the range of sensori-motor feedback loops which maintain the normal updating of anticipations regime, along with those fulfilments of anticipation we call 'perceptual experiences'. Abstract oneself from this maximal grip, lose 'reality contact' or 'fonction du réel', and the conditions are perfect for the neurological activation underlying sensory anticipation to come adrift and give rise to auditory ghosts.
We may contrast the above theory with the cognitivist account. I find it impossible to state the latter without committing the mereological fallacy, so built into the explanatory framework appears to be this assumption which yet vitiates it, but here goes: inner soliloquy is generated, there is no feed-forward so it is not inwardly expected, yet nevertheless it is inwardly encountered, and so is now taken as ego-alien. My phenomenological theory, by contrast, simply references the uncancelled subpersonal anticipations of sensory voice input themselves constituting the personal level AVH