I’m sure you have experienced the frustration of not recalling a person you meet unexpectedly and who obviously knows you quite well. The name lingers at the tip of your tongue, but is impossible to retrieve from memory, and you struggle not to give away that you are lost. Then the person may say something that gives you a clue, and suddenly associations and connected memories rush through your brain. Small pieces of the puzzle fall into place, and finally you recall the name, just seconds before it starts getting embarrassing. You elegantly and subtly verify that you know who you’re talking to and the crisis has been avoided.
Sounds familiar? I’ve been there! But why was it so hard to look up the information I obviously had saved on my hard disk? The answer is that our minds have no page index or table of contents like a regular book. Thinking is based on the principle of association. The next thought follows the previous, and to recall a memory from the library, we need a cue from which we can walk into the memory by following a path of associations. In statistical terms, thinking is a stochastic process, a ‘random walk’ which literary reflects a random walk of signals between neurons in the brain.
Random walk? Are my thoughts just nonsense balderdash? Of course not. The word ‘random’ has a certain daily language interpretation which is different from the statistical. Random just means that it is not completely deterministic. However, some random outputs from the stochastic process may be much more likely than others. I think a certain cognitive randomness is a necessary condition for the existence of free will, but that is another subject. (For my statistical view on whether we have free will, see my previous post The statistics of free will)
Let’s return to the associations. If I ask you: What is your first association to the word ‘yellow’?
Maybe you were thinking of the sun, a banana or perhaps a submarine, whereas ‘car’ would probably be less likely (unless you happen to own a yellow car).
The probabilities of moving to other thoughts from the current are called transition probabilities in statistics. My personal transition probability from yellow to submarine is quite high since I’m old enough to remember the Beatles. After thinking ‘submarine’ my continuing random walk of thought could be: Beatles-Lennon-Shot-Dead. Those were in fact my immediate associations. Your thoughts would probably take another random walk.
Stochastic processes are well studied in statistics, and it may be worthwhile to draw some connections between what we have learned from statistical research and cognitive processes like thinking and conversation. Such a comparison may give us new meta-cognitive perspectives on thinking, conversation, personality and psychopathological conditions like obsessive compulsive disorder (OCD), attention-hyperactive disorder (AHD) and Alzheimer’s Disease (AD). In this blog post I will look at the properties of some specific stochastic processes known as Markov Processes and Hidden Markov Processes in this cognitive context.
Let’s start out by assuming that at any given (and conscious) moment our thoughts are sampled from a fixed repertoire of potential thoughts and memories and that we are not influenced by external factors. We may call the thought repertoire the ‘state space’ of thought. The likelihood of the various cognitive outcomes from this state space depends on our history of experiences, the situation we’re in at the moment (the context), on interests and values, on the focus level (high or low focus) and on our personality traits. But it also depends on the current thought as a primer for the next. These factors define together a distribution of transition probabilities over the state space of thought. And from this distribution we sample what to think next.
The random sampling process of thoughts is very much alike the random sampling of parameters from candidate distribution as used in Markov Chain Monte Carlo (MCMC) estimation in Bayesian statistical inference. By MCMC a random walk process is initiated and sampling is run for a long time in order to estimate the unknown probability distribution from the sampled parameter values. The cognitive translation of this is as follows: By monitoring the random walk of thought of a person for a long time and recording the thoughts, we could get an estimate of the likelihood of all thoughts in the state space. If the random process behaves properly the estimate would be independent of the current thought and context. We might as well get estimates of other general (yet personal) things like the most typical thought and the variability of thoughts (some are more narrow-minded than others). Of course, such thought-monitoring is not possible unless you are monitoring yourself. The thinking processes for any other person is in that respect an example of what in statistics is known as a ‘hidden stochastic processes’. An output from this hidden process is only observed now and then as this person speaks. I will come back to this later.
A Markov process (after the Russian mathematician Andrey Markov) is a stochastic process where the probabilities of entering the next state given the entire history of previous states is just the same as the probability of the next state given only the current state. This is the Markov property. If we assume a cognitive Markov process, this means that the probabilities of my next thoughts only depend on the current thought, not on how I got there from previous thoughts. That is, if I for some reason got to think of the Beatles by another route than via yellow submarines, I would still be likely to think Lennon-Shot-Dead as the next sequence.
Whether cognitive processes satisfy the Markov property is perhaps questionable, but let’s stick to this assumption for simplicity since Markov processes and MCMC methods in statistical inference have many interesting properties which I think are relevant also for thinking, learning and neurological disorders.
So let us have a cognitive look at some of these properties.
Priming – Initial value dependence
In order to set up a Markov process an initial value must be given. This initial value is the ‘primer’ or anchor for the next thought. The effect of priming is well known and studied through many psychological studies. Priming describes how thoughts and decisions may be biased towards input cues. The cues may be more or less random or given deliberately to manipulate the cognitive response of the receiver of the cues. Priming is a widely used technique in commercial marketing where subtle messages are given to bias our opinions about products to increase the likelihood of us buying them, and social media marketing is now giving highly personal primers based on the information we provide online. In teaching such priming of thoughts through so-called flagging of headlines is a recommended trick to prepare the minds of the listeners before serving the details. For Markov processes the chain will forget its initial value after some time, and the effect of priming in psychology is similarly of limited time effect.
Focus level – Random walk step length and mixing level
For random walk processes the center of the distribution is typically the current value, but another important factor is the step length or variance of the distribution. If step lengths are short, the process moves very slowly across the state space only entering closely connected states. Furthermore, the series of visited states will show high level of auto-correlation, which in the cognitive setting means that thoughts tend to be similar and related. One might characterize a person with highly auto-correlated thinking as narrow minded, but we all tend to be narrow minded every time we focus strongly on solving some difficult task or concentrate on learning some new skill. Neurologically strong focus is induced by activation of inhibitory neurons through increased release of the neurotransmitter GABA (Gamma-aminobutyric acid) which reduce transition probabilities for long step transitions to irrelevant thoughts.
The problem with a slowly moving cognitive chain like this is the high likelihood of missing out on creative solutions to problems. If step lengths are allowed to increase (by reduction of inhibitory neuron activity) a more diffuse state of mind is induced, that facilitates creative thinking. However, too long step lengths may increase the risk of very remote ideas to pop up, only to be rejected as irrelevant in the current context. For long step lengths auto-correlation may be very low, and thoughts appear to be disconnected. Some persons suffering from attention-hyperactive disorder (AHD) may lack the ability to retain focus over long time due to having random walks of thought with too long steps. In statistical inference and MCMC estimation of some unknown probability distributions a so-called good mixing process is desirable, where the chain moves across the state space in intermediate step lengths, avoiding both being too narrow minded and too diffuse. Such good mixing processes has the largest probability of covering the state space in sufficient proportions within a limited time span. For cognitive processes the definition of good mixing will of course depend on the context, whether focusing or defocusing is most beneficial.
Thoughts suppression and Alzheimer’s – State space reducibility
A state space is irreducible if all states are accessible from any other state within a limited number of steps. If we for simplicity assume a static state space of thoughts and memory, this will be irreducible if any thought or memory can be reached by means of association from any other thought or memory. Of course our cognitive state space is not static, but reducibility of mind may occur in cases were memories are unconsciously suppressed and is never reached consciously (functional reducibility), or if connections to memories are lost due to damaged synapses (structural reducibility) like may happen due to Alzheimer’s disease.
Obsessive disorders – Periodicity of random walk
A Markov chain is periodic with period k if any return to a given state must occur in a multiple of k steps. A chain is said to be aperiodic if k=1. Aperiodic chains are usually desirable in MCMC, but in natural processes periodic processes may occur. It is reasonable to assume that cognitive processes are aperiodic, although some cognitive impairments like obsessive compulsive disorder (OCD) may show temporary periodicity where the patient does not seem to be able to snap out of a circular chain of thoughts.
Context dependent associations – Time-inhomogeneity
There might be times at which “submarine” would be an even more probable association from “yellow” than other times, for instance immediately after hearing the Beatles’ tune on the radio. This means that the transition probabilities are so-called time-inhomogeneous. Time-inhomogeneous Markov chains are often used in MCMC estimation when step-lengths (focus) are allowed to vary over time to optimize chain mixing. The inhomogeneity in cognitive processes is not only time dependent as means to adjust focus level (mixing), but the transition probabilities will also depend on context, place, and mood.
Plasticity and learning – Non-stationarity
This far we have for simplicity assumed that there is a fixed probability distribution across the state space of thought for an individual, and that the state space itself is static. This is characteristic for a stationary distribution in stochastic process theory. There is, however, no reason to believe that the cognitive state space is static, nor that the thought-distribution is stationary. This is due to the fact that we are all expanding and altering the state space through learning, and the brain is continuously changing, both functionally and structurally. The Markov chains of thinking are actually changing the probability distribution over the state space as it moves. This is because repetitively running chains of association is a key part of learning by which the transition probabilities are altered by the change of synaptic strengths of the association networks. Furthermore, new and previously unvisited thoughts occur during the random walk as result of creative thinking or learning from external input. Finally, non-visited parts of the state space may be eliminated (forgotten) through pruning of synaptic connections. The brain is very plastic and hence, so is the state space of thought. The very fact that the process visits these thoughts, increases the likelihood of a later revisit. Hence, the random walk of learning and creative thinking may be considered as a non-stationary stochastic process. If you think about it, this should be obvious. During our lifetime interest, values and the context we’re part of change, and this certainly reflect our thought processes.
Thinking and conversation – Hidden Markov Chains
As mentioned previously, to an outsider my thought process is hidden. In statistical inference Hidden Markov chains are used to model data where an assumption of an underlying stochastic process generating occasional observable output is reasonable. My thoughts are occasionally observable whenever I speak. Hidden MC’s are defined not only by transition probabilities for the hidden state space, but also state dependent probabilities for generating an observable output. Again, these output probabilities may depend on, for instance, the context and the personality. If I am on non-familiar ground, either literary or cognitively, I am less likely to express my thoughts. Furthermore, I am an introvert who are less expressive than extroverts. The cognitive process of an introvert is generally more hidden having smaller probabilities of generating output than the case is for an extrovert. This also has the implication that the outputs of an introvert may seem to be more disconnected and having small auto-correlations. Extroverts’ statements may on the other hand appear to have higher autocorrelation than those of introverts, and the latter group may easily get annoyed by extroverts saying the obvious and sticking too long with a topic.
Collaboration – Parallel chains
A common trick in order to monitor whether Markov chains have converged to their stationary processes, are mixing well and have forgotten their initial values, is to initiate several parallel chains with different initial values spread out across the state space. Comparing within and between chain variability gives information whether the mixing works properly and convergence has been reached. Further, parallel chains may faster cover the state space, and integrating the information from all chains yields quicker estimates of the properties of the state space distribution.
Parallel and hidden Markov chains interact in the context of a conversation at the lunch table, or during group based learning. Flipped classroom learning is an example of group learning which means that students see lecture videos at home preparing for group based learning and problem solving at school. The teacher operates more like a guide and discussant than a lecturer as he or she visits the groups. The homework prepares the student for the group learning process, and each group member joins the collaboration with their own hidden thought process with individual initial values. In addition varying experience and knowledge levels, interests values and personalities yields individual cognitive state space distributions. During the group process the parallel hidden random walks of thought evolve jointly towards a better understanding of the subject to be learned. Through conversation associations are exchanged which may lead to jumps in the hidden processes. These jumps can result in a better coverage of the state space and faster learning for each individual group member. The integration of information from multiple hidden parallel chains becomes effective through conversation and collaboration. At this point students’ personalities may influence the effectiveness of the learning process. Introverts have, as discussed above, smaller probabilities of generating outputs from the hidden chains compared to extroverts. Extroverts may therefore quicker get a correction of direction through the interaction with other extroverts, and this again may lead to faster convergence of thoughts than the case is for introverts who may get stuck sub-sampling limited thought regions for a long time.
Creativity – The parallel hidden chains of unconscious associations
Earlier I wrote that long step lengths of associations increases creativity, but the truly creative driver is probably the hidden and parallel processes of the unconscious mind. There is neuronal activity even in a resting brain and in brain regions that are not monitored by our consciousness, not only the sub-cortical regions and the cerebellum, but also in cortical regions which are outside the focus area of our consciousness (see my previous post …). Even if the signaling processes in these regions are hidden to us, they are likely to walk along the paths of highest transition probabilities. Furthermore, the unconscious random walks of associations are not restricted to be followed by our conscious attention (which is univariate). Hence, there may be multiple parallel hidden chains running in the unconsciousness. This may explain why the unconscious is such an effective problem solver and generator of creative thought. Sometimes the hidden processes produce coherent sampling which induce conscious attention by generating an (to our consciousness) observable output, an a-ha moment (This very idea was in fact served me from my unconsciousness right before going to sleep after writing the introduction to this post). How the unconscious processes contribute to a conscious experience and attention was among the topics for my previous post The statistics of effective learning.
In this post I have presented some similarities between statistics and cognition, and once more it seems like nature thought of it first. However, statistical knowledge may give some new insight and understanding of cognitive processes, as discussed here.