Metacognition

Artificial Creativity – A new dimension in AI?

Solve Sæbø — Mon, 26 Dec 2022 11:16:09 +0000

The year 2022 will perhaps be remembered as the year when artificial intelligence (AI) really became common property. Many have already tested programs such as DALL-E, Midjourney, Stable Diffusion, Galactica and ChatGPT. These are examples of two main types of programs that have attracted a lot of attention recently, and both are variants of what we can call generative artificial intelligence.

The first type of programs can create new images or illustrations from a text prompt, while the other can correspondingly create new text, such as an answer to a question or as a continuation of an introductory text. The special thing is that both the images and texts generated are completely new and not direct copies of other images or texts.

The AI algorithms are trained by being fed large amounts of images and texts that the developers have access to, preferably through their own web-platforms, such as Google, Facebook and Instagram. The algorithms have learned to respond to the textual input by combining elements, such as motifs and words, which often appear together in the training examples. The algorithms are very good at recognizing statistical patterns in large amounts of data and use this when new images or texts are to be generated.

The new programs have been received with everything from great enthusiasm by some, to great concern and skepticism by others. The strong emotions shown may reflect a belief that we are now facing a new dimension in AI, namely a kind of Artificial Creativity. In that case, this would be a radical step forward for AI. Could that be the case? Headlines like CNN’s «AI won an art contest, and artists are furious» is an example that shows that this is the impression out there, and that this has the potential to change the creative professions.

Creativity, no longer exclusive to humans?

Right from the start of the 2nd industrial revolution around 1870, through the transition to the digital society around 1970, and until the last decade’s strong rise in artificial intelligence, the fear that the machines will take our jobs away has been there. And many tasks have also disappeared through automation of the most routine jobs in this period. Through all these phases of industrial and technological development, there have nevertheless been human qualities that almost no one thought the machines could replace, and one of these is our ability to be creative. Creativity has, in a sense, been man’s safe haven against the systematic operationality of machines and algorithms.

Creativity is also an important component in many types of work, such as of course in art, music and poetry, but also in marketing, design, innovation and research. It is precisely this certainty about the limitations of machines in the creative field that is now being shaken by the new algorithms.

And this concern is now being expressed on many fronts. Artists and illustrators worldwide are now showing their great skepticism. One thing is that the algorithms create wonderful images and illustrations, but questionably it is done on the basis of their art and without this being credited or compensated. Another thing is that now anyone can create high-quality illustrations for their own use almost free of charge without engaging graphic artists and illustrators. The fear for one’s own livelihood and the feeling of exploitation of one’s own art recently led to a “NO TO AI GENERATED IMAGES” movement among artists on the online community ArtStation. Here they demand that their art should not be included in the training of the AI algorithms.

Protection of one’s own intellectual property was recently also the basis for a lawsuit in a quite different field, namely in computer programming. In November, a group of programmers filed a lawsuit against the companies GitHub, Microsoft and OpenAI for the way their algorithms exploit their and other people’s software to train their helper programs. These algorithms can generate ready data code to solve desired programming tasks. These co-pilots, as the AI programs are called, have thus been trained on large amounts of openly available program code created by contributors worldwide and used without their consent. Here, too, we see how men’s creative abilities are threatened.

Research is also a creative profession. The search for new knowledge requires new thinking and exploration of unknown territory. The company Meta, which owns Facebook, recently published the service Galactica, an AI generator that should help researchers to summarize available research in a field, or to generate text for scientific publications. The texts that the program generated were apparently impressive, but received massive criticism from researchers worldwide. It turned out that Galactica to a large extent, and with the greatest conviction, “hallucinated” both scientific results and references to other research. Galactica was removed from the network after three days, at least temporarily. Such hallucinatory properties also characterize the latest news in text generation, namely ChatGPT, which many have played with in recent weeks, and which is already creating challenges within the education system.

The example from research is important in terms of answering the question of whether we have now witnessed the rise of artificial creativity. But, what does it really mean to be creative? Is it sufficient to simply be able to create something new?

As we saw from Galactica’s troublesome launch, the answer is probably no. Something more is required.

There are typically two elements that are included in different definitions of creativity. A creative act must carry with it both an element of being “new”, but at the same time be considered “useful”. It was on this last point that Meta’s product did not quite measure up.

The texts that Galactica created were not considered useful enough. On the contrary. Meta received massive criticism for facilitating potential mass dissemination of “fake science”. Almost anyone could now easily create convincing «research articles» without substance and with erroneous conclusions and spread this in a suitable way, for instance through open archives.

“A photo of Wall-E painting a self-portrait in the style of Vincent van Gogh”
(An illustration cherry-picked by the author from images generated by DALL-E)

The importance of biases

All the programs that have been mentioned here will to some extent fulfill the requirement of creating something new, but to varying degrees they fulfill the second requirement of creating something useful.

In the latter of my series of blog posts on creativity (Bending, Blending and Breaking biases) I argue that a key to human creativity lies in our cognitive biases. This is so because every new thought aspiring to be deemed creative must be evaluated with regard to its usefulness within some reference frame or bias structure. It is common to use the phrase “thinking outside the box” to describe the process of coming up with new and unconventional thoughts that may lead to creative ideas, artwork or scientific discoveries. However, I believe this is turning creativity inside-out, literally. Creativity is always to think inside some box, because without a box (a bias, or a frame of reference) the usefulness of a novel thought cannot be evaluated!

In the same blog post I therefore claim that: the true act of creativity is the restructuring of intrinsic structures, biases or reference frames, with the purpose of bringing meaning to novelty.

This restructuring or change of biases is a top-down cognitive process by which we change the way we perceive the world. Basically we apply or create another «box» for our thinking.

Why is this so important in relation to artificial creativity? Well, it is the case that none of the usual algorithms used in artificial intelligence today are able to make such top-down assessment of usefulness of novelties, either of incoming perceptions or self-generated novelties. In short, the algorithms have no idea what they are seeing or creating. They have no top-down bias structure providing the tool for assessing what is normal or acceptable within old or new reference frames.

They are also unable to understand causal relationships, and thus cannot carry out thought experiments, which is important to be able to assess how a new idea or product will work if they were realized. Likewise, Galactica could not assess whether text, theories, formulas or references that it generated for scientific purposes, were actually in line with previous knowledge or basic axioms within a field.(See this post by G. Marcus for more discussion on the limitations of the type of models underlying ChatGPT and Galactica)

Generative AI and types of creativity

So does this mean that artificial intelligence is in no way creative? In order to get closer to an answer to this question we need to view the generative algorithms in light of different types of creativity.

In his research together with professor and composer Anthony Brandt at Rice University, brain researcher David Eagleman at Stanford University has found that most creative expressions created by humans can be placed in one of three main categories. They argue that human creativity can be described by the way it springs from existing ideas or products. It can happen either by bending, blending or breaking. Here I build further on their work as it can be useful when we have to decide on the existence of artificial creativity.

In the case of bending, creativity is unleashed through small adjustments to ideas or products that are already considered useful. A great deal of what is created in the world ends up in this category. All that is created by bending are variations of what is already accepted. AI can create new things by bending existing ideas and products, and in part these will be considered creative since what is created will very likely end up within what is already considered useful. Still, AI itself cannot evaluate this.
Creativity may alternatively find its outlet by mixing old ideas, that is, by blending, where it takes advantage of transferring concepts from one area to another. Old recipes can be used as solutions to new problems. Within research, this form is probably widespread in that methods and ideas can be applied within new issues or subject areas. This form of creativity requires a more thorough assessment of what can be considered useful or “correct” since one moves between what has been already accepted. The generative AI algorithms, such as DALL-E and Stable Diffusion, create their illustrations and images precisely through blending of images that it has learned from. However, they still fail to be able to consider their own creations as beautiful, harmonious, or useful in any sense whatsoever.
Thirdly, creativity through breaking often leads to revolutionary new designs through the disruption of established structures and through the building-up of new order. Since here one is moving completely outside of what was previously generally valid and accepted, it can be difficult and time-consuming to assert the usefulness. Anyone who succeeds with this type of creativity often appears as pioneering or brilliant in retrospect. Examples are Einstein with his theories of relativity, which were violations of previous “truths” in science, and Picasso, who presented innovative stylistic directions in the art of painting. As of today, there are probably no AI-algorithms capable of creating such new structures or orders that break radically with the input with which they have been trained, and of course they are also unable to assess the usefulness of such creations.

We can thus summarize the creative characteristics of AI algorithms in 2022 as follows:

Type of creativity	Can generate novelty	Expected to be useful
Bending	Yes	Yes, but unintentionally
Blending	Yes	No
Breaking	No	No

The conclusion that I draw from this is that we have yet to see algorithms that exhibit artificial creativity. Algorithms can in no way replace human creative competence as of today.

However, the new AI algorithms can be considered useful tools that can help increase the exploration of new territory and come up with suggestions for solutions within the categories of bending and blending. For ground-breaking creativity, for instance in research, AI comes short in every respect, though. Whether or not the final creation, be it art, music, innovation or research, can be considered creative, will for now have to be left entirely to the human being who uses these tools.

Epilogue

I started this blog series on creativity asking: Will Apple Siri ever shout Eureka? It is time to return to this question.

Through five blogs I have built up an argument that AI so far and to some extent is capable of generating novelty, and this is of course a necessary element of any Eureka moment experience. However, I have also argued that today’s algorithms are lacking the top-down ability to evaluate their own creations.

Many developers of AI are trying to figure out how the algorithms can become more flexible in order to be able to handle multi-modal inputs and shifting environments. Some are also working towards creating AI with some kind of human «common sense» and the ability to infer causality from perceptions. We are probably far from achieving this today. I’m convinced that any future solution to this must include a dynamical model with a stronger emphasis on top-down bias feedback loops to balance bottom-up perception. Here one should seek inspiration from human cognition and predictive inference theory.

Nevertheless, if the efforts to develop AI capable of top-down assessment of the usefulness of novelties become reality, it would be a small add-on simply to instruct the program to flag creative discoveries with an «Eureka»! eclamation.

So, yes, Siri, or more likely, her successor, will probably shout Eureka! some day, if we want her to.

However, this was not the kind of Eureka moment that Arkimedes experienced as he junmed out og his bath. The Eureka moment, or the moment of imaginative insight, as I discussed in this blog post, is the moment of sudden conscious awareness of an unconscious creation. The momentary experience of qualia that is accompanying the aha-feeling is still lacking.

Hence, Apple Siri of the future would only shout Eureka! and really «mean it» (!) if she was to become conscious of her own unconsciousness.

But that is another story….

The post Artificial Creativity – A new dimension in AI? appeared first on Metacognition.

Bending, blending and breaking biases

Solve Sæbø — Sun, 12 Sep 2021 19:36:14 +0000

It’s a paradox that AI developers are striving to make AI unbiased, when the key to human intelligence and creativity lies right there, in our biases. In this blog post I will show you why biases are so important for creative exploration and learning.

In this series on human and artificial creativity I use a common definition of creativity as novelty that works. In the previous blog post, The stochastic of divergent thinking, we explored properties of the stochastic process of divergent thinking, the first stage of a creative process where candidates of novelties are suggested. In this blog post I will turn to how our biases put constraints on divergent thinking, but also how we use biases to evaluate the usefulness of a divergent thought. Does it work? Is it meaningful? With regard to the potential of artificial creativity, the aspects of human creativity discussed in this blog post will most likely be a much greater challenge to the artificial implementation of creativity than the stochastic processes discussed in the previous blog post. Nevertheless, we need to delve into the role of biases in human creativity before we can understand whether human creativity can be replicated or simulated in computers.

In order to be creative we have to deal with the constraints of biases, and the title of this blog post is inspired by the findings of Eagleman and Brandt who found that most creative acts can be categorized into one of three main ways of dealing with biases.

Either you bend them, or you blend them, or you break them!

But before we turn to the importance of biases in the creative process, I will focus a bit on how our biases affect both our perception and our attention. When I use the term ‘bias’ here, I have a wide interpretation in mind, representing some cognitive structure, either inherent or learned. I will occasionally also use other terms used in psychology, neuroscience or statistics, like, ‘reference frame’, ‘belief’, ‘prior’ and ‘intrinsic model’. Common to all is that they represent some subjective structure of knowledge or belief that we already possess. It might be anything from knowing how to discern apples from pears, to more abstract opinions about the existence of free will or God.

Biases are attention filters

The human brain is both conservative and novelty seeking. It is well known from psychology that we are biased towards trying to confirm what we believe to be true, our opinions and prejudices. This is known as confirmation bias. However, this is, luckily, balanced by an urge to seek novelty. The salience hypothesis in psychology addresses the question of what guides our attention, and it states that our attention tends to seek novelties or things that “stick out” in our environment. Our senses are receiving a tremendous amount of information every second as we are awake, and somehow the brain has to filter out most of this information as it seeks some attention point. It makes sense to think that we should attend to things that change or differ from some background.

Surely, this is a good thing, for instance, if we drive along the street and a cat suddenly jumps into the road. The cat, representing a sudden change or surprise, attracts our immediate attention and gives us a chance to hit the breaks. However, a bird flying by in front of the car is less likely to give the same reaction, unless you are an ornithologist, perhaps. This example indicates that our attention is not only drawn towards novelty or surprise. This was demonstrated in study by Henderson et al.. They showed in experiments that visual attention is, in fact, also drawn towards meaning, and not surprise or novelty alone. This is contrary to the salience hypothesis, which has been the dominant view in later years. Human attention is thus guided by top-down intrinsic bias, an inner motivation, guided by meaning, interest, values or feelings. We might say that we are drawn towards surprises that are meaningful to us. Without this top-down evaluation and filtering of novelty, we would be swamped in all the unexplainable noise that surrounds us everywhere! Hence, we are born to be creative in the way we naturally seek novelties that work!

Biases and prediction errors

Besides serving as filters for attention, biases may have an even more direct influence on perception, for instance, on what we see or hear. A recently developed theory on human cognition is the theory of predictive coding stating that we all, in our daily lives, learn about and adapt to the environment by making predictions based on internal, cognitive models. We sense the world through our beliefs, and they are either retained or adjusted according to prediction errors. These errors are the differences between what we observe and our prior expectations based on the intrinsic models of our surroundings. So, we may say that we have internal, cognitive models of how things around us are expected to be and how they are predicted to change. In the extreme we could say that what we actually perceive are prediction errors! Hence, the intrinsic models, or biases, therefore also influence how we perceive the world. The act of learning is an ongoing Bayesian process of model updates in light of our prediction errors. It is therefore crucial for creative learning that we dare and are allowed to make errors to learn from!

Thus, biases and internal models of our surroundings are important mental structures, essential for survival and well-being. Throughout life we learn to categorize sensory inputs and ideas to predict future outcomes in a random world. We build or adapt opinions, about values, interests, culture and ethical standards, all being mental reference frames helping us make decisions and bring meaning to our lives. Thus, through learning we build order in chaos, a hierarchical structure of biases, of boxes, to put life into. Some boxes are wide and big, like ‘beauty’ or ‘symmetry’, others are narrow and perhaps contained in others, like ‘carrot’, and ‘vegetable’. Many boxes we learn or adopt from others, some we creatively discover ourselves, and some we are even born into life with.

Biases limiting divergent thinking

As discussed in the previous blog post, The stochastics of divergent thinking, the associative thought-process can statistically be compared to a non-stationary stochastic process, where the transition probabilities over the flexible state space of thought change over time and with context. It is very plausible to assume that the transition probabilities, defining which thoughts are more likely to be associated together, depend on our biases; our learned structures of interests, values, opinions, prejudices and so forth. Hence, if we have strong biases, they are also likely to put limits to the process of divergent thinking, making it less likely to generate atypical ideas. If a person has very strong opinions and beliefs, and perhaps scores low on the psycho-analytic trait openness to new experience (Big Five inventory), he or she may be very limited with regard to divergent thinking.

During our lifetime we are constantly expanding our level of knowledge, building order in chaos, a complex bias structure, which may become increasingly rigid and non-flexible. This may put limits to creativity and may explain why children seem to be more open to new ideas and be more creative than elderly people. Ronald A. Havens summarizes the insightful thinking of the famous psychotherapist Milton Erickson about this paradox of learning limiting new learning this way:

At first the ordinary person’s mind is relatively unstructured, objective, flexible and open to new learnings. Over time, however, it naturally becomes increasingly rigid, biased, idiosyncratic, and unable to accept perceptions, learnings, or responses that cannot be accommodated by its previously adopted structure.

And he continues:

Eventually the entire conscious awareness of the individual may become restrictively governed or dictated by the very structure that originally developed to allow an increased freedom of response.

In his book On creativity David Bohm describes this as self-sustained confusion that can arise when a person’s mental frames have become so rigid and structured that any divergent thoughts challenging this mental structure become conflicting and painful.

Sometimes this conflict is an inner conflict, but sadly the conflict may also be induced from the environment. The most popular TED-talk is the hilarious talk by Sir Ken Robinson with the slightly provocative title: “Do schools kill creativity?” Robinson makes a convincing argument that we are all born creative, but school has the unfortunate effect of making us suppress this inborn skill in the way our educational system is dominated by supervision, how it rewards conformity and punishes divergent thinking. Being “wrong” is not accepted, and the result seems to be to surrender to self-sustained confusion. This is sad considering that prediction errors are such a rich source to learning.

Biases and creativity.

I will now briefly point back to the previous blog post where a two step procedure for data simulation using Markov Chain Monte Carlo methods was used to exemplify the creative process. A dependent chain of values is iteratively generated from some target probability distribution through the two steps:

A new candidate value, typically depending on the current, is drawn from a proposal distribution (a random step).
The candidate is accepted or rejected as a new value of the chain in light of the target distribution (an evaluation step).

Inside or outside the box?

In the previous blog post we focused on the relevance of step 1 (divergent thinking) as part of creative processes, but for a divergent thought to be accepted as creative it has to be evaluated with regard to its usefulness. Every new thought aspiring to be deemed creative must be evaluated with regard to its usefulness within some reference frame or bias structure. It is common to use the phrase thinking outside the box to describe the process of coming up with new and unconventional thoughts that may lead to creative ideas, artwork or scientific discoveries. However, I believe this is turning creativity inside-out, literally. Creativity is always to think inside some box, because without a box (a bias, or a frame of reference) the usefulness of a novel thought cannot be evaluated!

Our biases and reference frames may statistically be seen as priors in a Bayesian belief update process. Given our biases and mental models about how the world is, we evaluate the likelihood of new observations. If new observations seem trustworthy, but in conflict with our beliefs, we may “choose” to change our beliefs. On the other hand, if biases are strong and data are ambiguous, we may stick to our beliefs. This balance between the likelihood of new observations and our prior beliefs is expressed by Bayes rule, which we quite heuristically can write like this:

Posterior (belief) ∝ Prior (belief) ᐧ Likelihood (new data | belief)

Where the symbol “∝” means “proportional to”. In an ongoing learning process, prior beliefs are updated (or not) into posterior beliefs in light of new observations. The posterior belief may later serve as a new prior belief as part of continued learning. However, if Bayes’ rule was the only way our beliefs could change, we would not be very creative! Any novel idea would receive a low likelihood given our biases, and the result would be to be stuck in our priors. The most radical way to be creative is to change our prior beliefs, to transform our biases.

Hence, creativity is most about challenging our old boxes, and to bend, blend or break our hierarchical bias structure and reference frames. A divergent thought may be rejected as being creative according to one reference frame, but can, in fact, be accepted within a wider or transformed reference frame. After reflecting upon and studying the creative process for a long time, and being particularly inspired by Bohm’s stages of imaginative and rational insight and fancy, I have come to a conclusion regarding the very essence of creativity.

The true act of creativity is the restructuring of intrinsic structures, biases or reference frames, with the purpose of bringing meaning to novelty.

Hence, the actual painting, a sculpture, a new music play or even a new scientific theory are not creative in themselves, and neither is the craftsmanship or the deductive reasoning that produced the sculpture or the new theoretical lemmas. Such manifestations of the creative insight into something observable is merely a matter of skillful production or deduction (Bohm’s fancy). It is the imaginative and new insight of a new order, a new concept, or a new reference frame, which is the true leap of creativity. However, both insight and fancy are necessary stages in a creative process in the way they work interchangeably and stimulate one another, according to Bohm. This is also well known from design thinking principles. On the other hand, one may question whether a person, or a computer (as we will return to in the next blog post) is creative in case he/she/it is only involved in the deductive reasoning step of the process.

David Bohm uses the story of Helen Keller to illustrate the creative transformation or discovery of structure or order in chaos happening in the stage of insight. Helen turned deaf and blind after catching a fever when she was less than two years old. Without language she spent her first years more or less isolated from the outer world until her parents hired a private teacher, Anne Sullivan. Sullivan came to Helen’s rescue and managed to reach into her mind by clever conduct and patience. She did so by exposing Helen over and over again to the substance of water under different conditions, like ice, liquid water and steam, each time writing «water» in the palm of Helen’s hand. At first the sensory input signals Helen received must have felt rather chaotic, but in a moment of sudden awakening, Helen realized that all sensations could be classified together, everything being water. Bohm points to Helen’s sudden realization of a higher order structure, the concept of water, as an example of high level creativity. Although Sullivan tried to help her come to this realization, the giant leap of discovering order in chaos had to be made by Helen herself.

In the following we will look closer into how biases are used and transformed in different types of creative processes. This will make it easier to discuss the potentials of artificial creativity later.

Supervision (Bottom-up processing)

Supervised learning may serve as a baseline with which to compare creative processes. If, for instance, a kid or a student acquires all knowledge and mental structures from external sources, like from a parent or a teacher, in an entirely bottom-up type learning process, we may refer to the learning process as being supervised. Bottom-up learning means that information flows from input perceptions from our senses to form knowledge, but in a non-creative way. In the extreme, all order, structures and biases are directly transferred from a supervisor to the learner.. Without reflection or critical thinking this is purely an information transfer process. A bottom-up learner would always follow instructions and never raise a critical question to the knowledge structures. Hence, pure bottom-up learning does not nurture creativity.

Imagination (Top-down processing)

Imagination is a requirement for all types of creativity, and especially for Bohm’s fancy stage of a creative process. Neurologically it is a top-down process, where the top level biases are fed back to the sensory cortices to either alter factual perception of the world, or to (re)play 100% imaginative experiences, like we do in dreams. If I ask you to close your eyes and imagine the face of a close relative, then recent research supports the notion that you are in fact running your neural system backwards, from your higher order structures defining your relative, down to the mid-layers of your visual cortex area to create a mental picture of the face of your relative. Hence, imagination is reverse application of cognitive biases by a top-down information flow. This opens the ability for humans to imagine possible futures or to perform imagined experiments, like Einstein´s «Gedankenexperimenten», which is an important part of Bohm´s fancy. Neurological studies support this. For instance, Kounios and Beeman found through EEG studies that people, who through tests appeared to be more imaginative than others, had higher resting state activity in the visual cortex.

As mentioned above, the title of this blog post is inspired by the findings of Eagleman and Brandt who observed that most acts of creativity may be described as a result of three different ways of challenging old biases, namely by “bending”, “blending” or “breaking”.

Bending

Relatively novel and useful creations may come about as small adjustments to already accepted ideas, theories or structures. A small change to established ideas is typically easy to suggest and readily acknowledged. Probably, most published research can be described as “bending” of previous work, making incremental steps towards a more complete knowledge structure of a given field within an accepted paradigm. Also in art, bending is common, but whenever the work of a scientist or the creation of an artist is regarded as highly creative, bending is probably not the type of creativity involved. In terms of Bohm’s insight and fancy, I would characterize bending as primarily applying imaginative and rational fancy to already accepted bias structures and hence, the level of creativity is low, and the manifestations can be characterized as variations of a common theme.

Blending

“Blending” is a good characterization of creativity that is a result of applying previously learned structures to new areas. In this way order is extended by recognizing that old familiar structures may be “blended” into the new and unexplored variability. Returning to the story of Helen Keller, in addition to having a sudden insight about water, she also had an immediate realization of concepts as a general structure which could be applied to (or blended into) other experiences beyond water. This learned structure opened up endless possibilities for exploring new experiences for Helen through blending.

Blending is a powerful tool for learning in general, not only for creativity. In supervised learning, the use of metaphors has been used for millennia to help students understand and learn new concepts. The Biblical parables are ancient examples of such, and so is Plato’s allegory of the cave. Similarly blending may help the brain to see new order as part of a self-supervised creative process.

Breaking

The son of Pablo Picasso once said about his father that he had the habit of breaking everything, only to rebuild it in a novel and creative way. This is a perfect example of the last type of creativity defined by Eagleman and Brandt, namely «breaking», which is a kind of self-supervised revision of existing structures.This is likely the most challenging type of creativity in two ways.

Firstly, as humans we tend to prefer to confirm old structures (confirmation bias). Breaking established order is for most people experienced as a stressful experience since the sense of stability is reduced or even destroyed. The scientist or the artist who questions structure must endure the emotional distress of the increased disorder and chaos that occurs before a new order is found. David Bohm writes about the self-sustained confusion maintained by those who cannot bear the distress of breaking old reference frames. This confusion may be sustained even if the old, and perhaps dear, biases clearly violate experience and perceptions. Rather than going through the pain of breaking and rebuilding biases, a person may confuse her-/himself by pointing to increasingly improbable explanations and exceptions. This may be a problem both in society at large, but even so in science. Scientists who have devoted their entire career within a scientific theory or paradigm may emotionally cling to the old biases and structures, rather than accepting that the contradictions between theory and data indicate the need of breaking and rebuilding theories.

“[on originality]…, he must be able to learn something new, even if this means that the ideas and notions that are comfortable or dear to him may be overturned.”

David Bohm

However, if the pain and confusion is overcomed, and new, and more general order is found, the reward may overshadow the preceding distress. Apparently, some highly creative people, like Picasso, seem to handle this better than others and may even be curiously attracted to such a chaotic state. Also in scientific research it seems like some people can endure and perhaps be more attracted to the unresolved mysteries than others. Dörfler et al (2018) describes a state called negative capability that scientists need in order to cope in times of transforming theories and changes of paradigms:

“Beyond the engagement with reality (and thus data), the negative capability is also important for an achievement of comprehension when we accept that the reality does not play by the management textbooks, and that researchers inevitably have to face a lack of internal consistency in their emerging understanding. Sometimes inconsistencies will disappear during the research project, but often they can persist for years. Thus, researchers need to develop an ability to cope with such a situation – and they need a framework in which a less than complete internal consistency can be accepted.”

For this type of creativity, requiring breaking of old reference frames, one may wonder how subjective usefulness is judged in the moment of insight in the creator. It is quite apparent that too narrow biases (boxes) of the old structure must be ignored in favor of wider biases or prior distributions of acceptance of candidate ideas, as we would put it in statistical terms. It is likely that highly creative people use high level biases, as the sensations of wholeness, beauty and harmony, as guidance for their openness to new ideas. David Bohm states that highly creative people, be it scientists or artists alike, seek a sense of wholeness, symmetry, harmony or beauty in their work:

«The new order leads eventually to the creation of new structures having the qualities of harmony and totality, and therefore the feeling of beauty».

David Bohm, in «On creativity».

This may also explain why simplicity, beauty, symmetry and totality tend to be more common measures of scientific validity of theories in some areas of mathematics and physics where there is a lack of experimental data for falsification studies, like for instance in cosmology.

Openness to new experience, as is a known trait of more creative personalities, comes with a cost of increased uncertainty about the actual usefulness of the ideas, only revealed through subsequent analysis and testing. Many have admired the ingenuity of Thomas Edison and his inventions, but he was also not afraid to try, and fail, repeatedly! But every now and then his insight was right.

Secondly, if an old, established structure is broken and a new order is suggested, it is very likely that the novel idea is met with scepticism, rejection, or even ridicule in the community that still lives under the old structure. A scientist suggesting a shift of paradigm, or an artist creating a completely novel way of expression, risks that the difference in structural bias is too large for acceptance of the suggested new ideas in the community. The burden of proof of usefulness of the new idea, increases with the distance from the established biases.

The neurology of insight

Numerous neurological studies have been conducted using brain scans (e.g. fMRI) during creativity tests, and researchers have gained some knowledge as to which parts or networks of the brain that are active during various stages of creativity. Some studies have shown that divergent thinking and creativity is positively correlated with increased neural activity in the so-called default mode network (DMN) in the brain. This network is typically active when we do not attend strongly to the outer world, but instead are focusing on internal goals, memories or planning. It is also shown to be active when we are preoccupied doing cognitively non-demanding tasks with low amounts of sensory input, like routine work, walking, showering or other moments of serendipity. Neuroscientific studies have also shown that other parts of the brain, making up the so-called executive control network (ECN), is more active when we are attending cognitively or emotionally challenging tasks, like problem solving and decision making. In parallel to David Bohm’s view that the creative process is an iterative process switching back and forth between the stages of insight and fancy, a creative person must own enough flexible cognitive control to be able to switch effectively between the DMN and the ECN networks (Zabelina and Robinson, 2010). Recent neuroimaging studies support this, showing that flexible and dynamic interactions between DMN and ECN is key to creativity (Beaty et al. 2018). Apparently, also a third cognitive network, the so-called salience network (SN), is important here in controlling the interplay between DMN and ECN.

Hence, the quest for finding the connection between creativity and brain activity has lately switched from a point of view of brain regions to considerations of network connectivity and flexibility. In the previous blog post, the stochastics of divergent thinking, we connected such flexible cognitive control to the ability to switch easily between short and long transitions in the random walk of associative thinking, effectively switching between focused and diffuse states of mind. Apparently this is not only a matter of switching step lengths, but also switching between networks.

So the interplay between the three networks, DMN, ECN and SN, seems to correlate with a (semi-)conscious form of the creative process, in line with the rational insight and rational fancy stages of David Bohm. When it comes to imaginative insight, the Eureka moment type of creativity, a fourth brain network, has gained increasing attention in the last couple of decades. This is the so-called cerebro-cerebellar pathways connecting the cerebral cortices with the cerebellum.

The importance of the cerebellum in fine tuning and optimizing body movements (motor control) has been known for a long time. It is theorized that the cerebellum is creating fluent and advanced motor control by using predictive models (theory of predictive coding) combining automized movements (basic motor constituents) and performing continuous adjustments based on prediction errors. However, the importance of this brain region in mental processes, has a much shorter recognition in science. The role of the cerebellum also in creative processes has in later years been well elaborated and explored by Larry Vandervert who relates the manipulations of this brain region to creativity by blending of thoughts: “In sum, the cerebellum appears to play a predominant role in the refinement and blending of virtually all repeated movements, thoughts, and emotions».

Although the cerebellum contains more than 75% of all neurons in the brain, its cognitive processes are hidden in the unconscious. It is likely that the tremendous unconscious source of creativity discussed in the stochastics of divergent thinking is due to the skillful manipulations in the cerebellum. Vandervert states that cerebellum is unconsciously generalizing thoughts (as well as movements) using so-called inverse dynamic models: “The inverse dynamics model helps explain how generalizations can be formed outside a person’s conscious awareness. This is a major reason that intuition may seem to leap out of ‘‘nowhere.” The imaginative insights of Bohm may thus be the outcome of these cerebellar processes, and a possible explanation, paralleling the conscious and rational insight discussed above, is an interplay where the cerebellum is supporting, and partly replacing, the ECN in the creative interplay with SN and DMN. In this way conscious and serial manipulation of thoughts may hypothetically be replaced by unconscious, parallel manipulations controlled by the cerebellum. During the flexible switching of the creative process, the ECN feeds the cerebellum with intentions, interests (biases) and problems to be solved. Further, the DMN provides a rich variety of reshuffled thoughts, memories and ideas, which the cerebellum recombines to fit intrinsic goal oriented models. This is occasionally fed back to ECN to create imaginative and conscious insights, the eureka moments. This is somewhat in line with the ideas of Vandervert: “These refinements in skills and thought occur through the cerebellum, because cerebellar internal models unconsciously drive automaticity and error-correction toward optimization in skills and thought, which is then sent to and consciously experienced in the cerebral cortex.”

Just like for motor skills, like riding a bike or making the perfect golf swing, the cerebellum has automated primarily consciously controlled functions into unconscious routines. A question is how and when the imaginative insights are delivered to the cerebral cortex as conscious experiences of new ideas and orders. Obviously the news-content must be sufficiently important and relevant to a given context and problem to alert the conscious mind. We may imagine that sudden insight occurs when the unconsciousness finds a good posterior fit of (potentially reshuffled) new experiences into bias structures that we find particularly interesting. This may also partly explain why a-ha moments tend to occur during so-called incubation periods of serendipitous activity shortly after focusing hard on a given problem for some time. The focus period may simply increase the probabilistic value of the prior structures we find particularly interesting or promising for a given problem. This comes in addition to the fact that subsequent defocusing itself may help the signal reach the surface of consciousness easier, simply because the general attention level is lowered and competing focus points are few and weak (weak priors).

Invariance and fractals

We have discussed how creativity can be understood as the discovery of new order or structure by transforming and reshaping biases, typically by bending, blending or breaking the old mental frames. Since the semi-conscious or unconscious processes seem to be such a rich source of creativity, there appears to be favorable conditions for creative processes when the mind is unfocused. We have touched upon several potential factors in this blog series that can partly explain this powerful property of the unconscious, like parallel processing, reshuffling of association chains, increased processing speed, and weakened biases. All these factors may together generate a favorable cognitive state for creativity; a state where the brain can exploit the fractal properties of biases due to invariance. This sounds mystical, but let me explain.

A fractal can be seen as a structure or pattern which repeats itself or is self-similar at different scales. This part of mathematics is perhaps most famous from the Mandelbrot set as exemplified in Figure 2 where we can see swirls repeat at different scales as we dive into the picture.

Figure 2: The Mandelbrot set with self-similar patterns repeating itself at different scales.

Several researchers have discussed fractal properties also in the human brain, for instance, in the way that the branching of the neural networks repeats at different scales. However, also in its functioning the brain seems to be fractal, and this may explain how insight occurs, both consciously and unconsciously. The human brain is very good at finding similarities and familiar patterns everywhere. For instance, when we look up at the sky we may suddenly imagine the shape of a rabbit in a big cloud. Perhaps we even have to imagine the rabbit up-side down or with abnormally large ears or legs, but still we see it. The human brain is an extremely effective pattern recognition machine, and it can easily bend, blend or “break-and-rebuild” structures by relocating, rescaling and/or rotating old, familiar patterns. Hence these patterns may be regarded as fractals which can be applied at different scales and in different locations and rotations. During this imaginative process scales seem to be non-important. In statistical terms we may use the term invariance to describe such a condition where location, scales or rotations don’t matter. It may be hypothesized that when we are in a semi-conscious or unconscious state of mind, the brain enters an invariant and fractal mode where new order is more easily generated through bending, blending or breaking. Even time may become invariant in the realm of the unconscious mind, just remember how the speed of association processes are increased dramatically (shrinking time) and elements of thought chains are reshuffled (breaking chronology).

Liu et al,,2019, who found that the brain appears to be unconsciously reshuffling order of sequences in an attempt to fit new experiences into existing orders and structures, state that this “Generalization of learned structures to new experiences may be facilitated by representing structural information in a format that is independent from its sensory consequences,…” Furthemore, they write that “Keeping structural knowledge independent of particular sensory objects is a form of factorization. Factorization (also called ‘‘disentangling’’; Bengio et al., 2013; Higgins et al., 2017a) means that different coding units (e.g., neurons) represent different factors of variation in the world». Factorization, disentangling and invariance are here all terms describing an objective and somewhat unstructured state that may be envisioned in the unconscious mind, wherein the cerebellum is free to make goal-directed fractal compositions.

This state of mind where order and structures are broken and transformed, resembles the phase transitions of matter in physics. For instance, when ice melts and forms liquid water, the tight order and structure of ice is broken, and a new structure is formed where water molecules move more freely. Recently a group of mathematicians have come close to prove that so called conformal invariance is a necessary property of phase transitions, the critical state between two phases of matter. Conformal invariance is a more extensive invariance which comprises all the three other invariances mentioned above; location (translational) invariance, scale invariance and rotation invariance. In our context it may be hypothesized that a state of conformal invariance also is a characteristic of the unconscious mind, and that the cognitive flexibility of insight and fancy, as described by David Bohm, can be compared to the phase transitions of matter. The creative process fluctuates between insight and fancy like some physical matter moves in and out of a critical state. From the highly invariant unconscious mind, new structure may crystallize as novel and useful ideas popping into consciousness.

From human creativity I will switch to artificial creativity in the next blog post, and I will return to the notion expressed by McCarthy and colleagues in 1955 about the potential of implementing algorithms that at least can simulate human creativity. The last three posts on the stochastics of human creativity will serve as a reference frame for my discussion on artificial creativity. Stay tuned!

The post Bending, blending and breaking biases appeared first on Metacognition.

The stochastics of divergent thinking

Solve Sæbø — Wed, 09 Jun 2021 19:19:26 +0000

(This is the third blog post in a series on the stochastics of human and artificial creativity. You can read the first two posts here and here)

Creativity is a random process that resides along the edge between chaos and control. This was the conclusion of the previous blog post, where we explored the ideas of David Bohm. In this blog post we will look closer into the chaotic, or stochastic, part of creativity. Divergent thinking or thinking outside the box is often used to describe this somewhat unpredictable exploration of the unknown in the pursuit of potential new, creative ideas. If we seek to understand this random component of human creativity, a closer look into the statistics field of stochastic processes may be beneficial. This is also necessary if we want to follow up on the conjecture of McCarthy and colleagues from 1956 (see first blog post) that human creativity may, at least, be imitated by computers.

Maybe you are familiar with the possibility to use statistics software to draw numbers as a random sample from some probability distribution, for instance from a normal distribution. There are numerous methods that are developed for such random sampling, and one group of methods is known as Markov Chain Monte Carlo methods (MCMC). These are commonly used in Bayesian modeling in statistical inference. They are computer intensive methods where a so-called random walk process is initiated in order to simulate a dependent chain of observations from a given probability distribution, often called a target distribution. In a typical set up of MCMC each successive step of the Markov chain depends on the previous through a current value, being either a starting point or the latest sampled value. Given the present, as a current value, the method contains two parts:

A new candidate value, typically depending on the current, is drawn from a proposal distribution (a random step).
The candidate is accepted or rejected as a new value of the chain in light of the target distribution (an evaluation step).

Stochastic processes

You may have seen the relevance to creative thinking already. This Markov chain resembles an associative chain of thoughts. The proposal distribution is at large responsible for the randomness of the creative process, whereas the target distribution (or we could say the bias) evaluates the usefulness. This is a statistical framework for setting up association processes which satisfy both requirements of a creative process, as discussed in the previous blog post in this series. What’s needed is a sensible way to generate candidate ideas and a way to evaluate the usefulness of such ideas.

The cognitive analogy of MCMC is perhaps that of thinking inside the box. It is like a situation where we sit and reflect, making a chain of associations where all thoughts are evaluated within the same measure of usefulness, which is our current personal cognitive bias. Usually we speak of creativity as thinking outside the box, though, but as we will see later, thinking outside the box is all about challenging our old boxes, that is, by bending, blending or breaking the biases we are using to judge our thoughts by. I will return to the importance of biases in the next blog post. Now I will use the MCMC example and the statistical framework of Markov models to dig deeper into the stochastic properties of creativity.

Introverted thinking is a fully associative process. One thought leads to another. If we want to recall memories, we cannot look them up like we use the index pages of a dictionary. We need to get hold of them through associations. In neuroscience a common understanding is that thoughts and memories are stored in long-term memory as paths of strongly connected neurons, sometimes called «engrams», organized in a huge association network.

The psychologist Donald Hebb described learning at the neuronal level (in 1949) as: “neurons that fire together, wire together”. The Hebbian theory describes how synapses, the neuronal connections, are strengthened in a learning process if two connected neurons are activated simultaneously or successively during experiencing or learning. However, the reverse phrase, “neurons that wire together, fire together”, is perhaps more appropriate to describe the replay processes we perform whenever we are recalling memories and learned facts. Random signal processes will tend to follow well established routes, either as a conscious chain of thoughts, or in the realm of our unconsciousness. In fact, chains of connected memories are replayed even when we sleep, as found in rodents by Wilson and McNoughton (1994).

Let’s pause a bit and check out some of your wirings. What is your first association with the word ‘yellow’? Maybe you were thinking ‘sun’, ‘banana’ or perhaps ‘submarine’, whereas ‘car’ would probably be less likely (unless you happen to own a yellow car). The probabilities of moving to other thoughts from the current thought would be called transition probabilities in a statistical language. My personal transition probability from ‘yellow’ to ‘submarine’ is relatively high since I remember well the Beatles’ hit ‘Yellow Submarine’. After thinking ‘submarine’ my next immediate associations were ‘Beatles – Lennon – Shot – Dead’. Your association chain would probably follow another route of random walk, and so would perhaps mine in another circumstance.

Let’s first assume that at this very moment your thoughts are sampled from a fixed repertoire of potential thoughts and memories. We may call the thought repertoire of yours the state space of thought. The probability of the various cognitive outcomes from this state space depends on your entire history of experiences, and together they define the distribution of transition probabilities over your personal state space of thought. For simplicity we first assume that the transition probabilities do not change over time. Some transitions are highly probable, others are quite unlikely. From this distribution you make a random choice of what to think next. As seen in our little statistical example above, Markov processes (named after the Russian mathematician Andrey Markov) can to some extent approximate cognitive processes. Markov processes have many interesting properties which are cognitively relevant also for creativity. I will highlight some of them here.

The primer effect

The first property is the ‘primer effect’. A random walk process must start somewhere, hence it requires a starting value, or a so-called primer. From the primer the process ‘walks’ from one value to another. Accordingly, for a cognitive process the current thought is very often the primer for the next thought.

The effect of priming is well known and studied through many psychological studies. Priming describes how thoughts and decisions may be biased towards input cues. The cues may be more or less random or given deliberately to manipulate the cognitive response of the receiver of the cues. Priming is, for instance, a widely used technique in consumer marketing where subtle messages are given to influence our opinions about products to increase the likelihood of us buying them. Another example is social media marketing giving highly personal primers based on the information we provide online.

The priming effect is also important in the context of creativity. In the absence of external stimuli, chains of thoughts are likely to be rather introverted and mainly traverse highly familiar ground in our state space of thought. In a sense we are then not primed to be novel. Larger jumps may be primed if we are open to our surroundings or perhaps even seek new environments and perspectives, for instance by traveling to explore new cultures, taking another route to work, or reading a random book from the library. This may trigger chains of thoughts that explore paths less travelled by in your state space of thought. Kounios and Beeman discuss multiple factors in their book “The Eureka factor: Aha Moments, Creative Insight, and the Brain” that can prime insightfulness.

Focus level

Another important parameter in random walk processes is the step length of the walk. This is typically controlled by the proposal distribution. We can think of this as the focus parameter of the process. If step lengths are short, the process moves very slowly across the state space, only entering closely connected states within long time spans. Furthermore, the series of visited states will show a high level of so-called auto-correlation, which in the cognitive setting means that thoughts tend to be similar and related over time. One might characterize a person with highly auto-correlated thinking as narrow-minded. Short steps plus intentional priming is the way recommendation services like Netflix or social media feeds manage to keep you continuously satisfied. Unfortunately, it may also have the effect of limiting the range of impulses and perspectives. However, being narrow-minded is also necessary, for instance when we have to focus strongly on solving some difficult task or concentrate on learning some new skill.

Neurologically, strong focus is induced in the brain by activation of inhibitory neurons through increased release of the neurotransmitter GABA (Gamma-aminobutyric acid), which reduces transition probabilities for long step transitions to seemingly irrelevant or distracting thoughts. GABA has the effect of shutting out external and distracting signals. This is the case during stages of fancy, as discussed by David Bohm, especially during what he referred to as rational fancy (see previous blog post in this series). If step lengths are allowed to increase (by a reduction of GABA, focus is lowered and a more diffuse state of mind is induced, as in Bohm’s stages of insight.

The problem with the slowly moving cognitive chain of the focused mind is the high probability of missing out on creative candidate solutions to problems. It simply takes too long to traverse the relevant areas of the state space of thought! On the other hand, too long steps may increase the risk of very remote ideas to pop up, only to be rejected as irrelevant in the current context. In the case of long step lengths, auto-correlation may be very low, and thoughts appear to be disconnected. In a previous blog post Be creative – use your noisy brain I have written about how spontaneous firing in the resting brain can be a source of creativity by nurturing long and disconnected jumps in the state space of thought. This noisy feature of the brain is a powerful source to divergent thinking.

Cognitive flexibility

In Bayesian inference so-called well-mixing Markov processes are desirable, wherein the chain moves across the state space with optimal step lengths, avoiding both being too narrow-minded and too diffuse. Such well-mixing processes have the largest probability of covering a relevant state space in sufficient proportions within a limited time span. A special kind of Markov chain is the time-inhomogeneous chain, where step-lengths (focus) are allowed to vary over time to optimize chain mixing.

For cognitive processes the definition of good mixing will, of course, depend on the context, whether focusing or defocusing is most beneficial. Later we will see that the ability to be flexible and switch easily between short and long transitions may be beneficial for the creative process. When David Bohm in his book On creativity describes the creative process as an iterative process between stages of insight and stages of fancy, he really describes a time-inhomogeneous stochastic process of thinking. This flexibility is also a key component in a popular work mode in design and innovation processes known as design thinking.

Plasticity and learning

Thus far we have for simplicity assumed that there is a fixed probability distribution across the state space of thought for an individual, and that the state space itself is static. This is characteristic for a stationary distribution in stochastic process theory. There is, however, no reason to believe that the cognitive state space is static, nor that the thought-distribution is stationary. This is due to the fact that we are all expanding and altering the state space through learning, and the brain is continuously changing, both functionally and structurally. The associative chains of thoughts are actually changing the probability distribution over the state space as it moves. This is because repetitively running chains of association is a key part of learning, and the recall process changes the synaptic strength between the neurons. The very fact that the process visits certain thoughts or memories, increases the probability of a later revisit. Furthermore, new and previously unvisited thoughts occur during the random walk as a result of creative thinking or learning from external input. On the other hand, parts of the state space may also be almost or entirely eliminated (forgotten) through loss of synaptic connections, for instance, by rarely being revisited or by brain lesions. The brain is very plastic and changeable, and hence, so is the state space of thought. Therefore, the random walk of learning and creative thinking may be considered as a non-stationary stochastic process. If you think about it, this should be obvious. During our lifetime interest, values and the context we’re part of change, and this certainly reflects our thought processes. It is reasonable to assume that divergent thinking, seeking new perspectives and expanding one’s field of knowledge has a self-enhancing effect on creativity due to the inhomogeneity and non-stationarity properties of the associative thinking process.

Dialogue

Multiple Markov chains running in parallel may cover a state space faster, and integrating the information from many chains may be beneficial for many purposes in a statistical setting. Parallel cognitive processes are also integrated in the context of a conversation, for instance, at the lunch table, by the coffee machine, or during group based learning. Different people have different experiences and knowledge levels, fields of interests, values and personalities. Each person even has his or her individual cognitive state space and transition distribution. Through dialogue, parallel random walks of thought may evolve jointly towards a better understanding of some subject to be learned.

A dialogue – Credits: Håkon Sparre/NMBU

However, in a dialogue, the thought processes themselves are actually not interacting, because the participants do not observe one another’s thought processes. A special kind of Markov processes, called Hidden Markov Models (HMMs), can give insight into the dialogue in that respect. In statistical inference HMMs are used to model processes where it is reasonable to assume that there is an underlying, hidden process that occasionally gives rise to observable output. Our cognitively relevant example of a HMM is your indirect observation of my thought process. My conscious thoughts are available to me, but only occasionally and approximately observable to you, and that is whenever I orally express my thoughts.

In the brain the process of transferring conscious thoughts from the prefrontal cortex to the cortical areas responsible for speech (Broca’s area), is itself a stochastic process that adds noise to the output. This manifests itself in the fact that sometimes it is difficult to express exactly what you are thinking. HMM’s are similarly defined not only by transition probabilities for the hidden state space, but also state-dependent probabilities for generating observable and noisy output. Hence, some thoughts are more likely to be expressed than others, and with higher or lower accuracy. Furthermore, it is reasonable to assume that the output probabilities are inhomogeneous, meaning that various contextual factors and conditions have an influence on the probability that thoughts are articulated. The output probabilities may, for instance, depend on the context of the conversation, or the personalities of the participants of a dialogue. For instance, if I am on non-familiar ground, either literary or cognitively, I am less likely to express my thoughts. Furthermore, I am an introverted person who typically is less expressive than my extraverted peers in dialogues. The cognitive processes of an introvert are generally more hidden, having smaller probabilities of generating output than the case is for an extravert. For others, my expressed thoughts as an introvert may therefore seem less connected (having lower auto-correlation) compared to an extraverted and more talkative person.

Creative team processes, involving dialogue and conversations, may be explored further in the context of parallel HMMs, but a main point in this context is that through conversation, associations are exchanged, which may lead to jumps in the thought processes of the participants. These jumps can result in a better coverage of the state space and faster idea generation and learning for each individual group member. The integration of information from multiple parallel processes becomes most effective through open and empathetic dialogue where processes have equal opportunity to influence one another. Once again David Bohm is relevant. The open, non-judgemental dialogue as an arena for creativity, is well explored in his book On dialogue.

The powerful dialogue of the unconscious brain

Parallel associative processes also occur in another cognitive domain, namely in the unconscious mind. The brain is never at rest. Even when we have dreamless sleep, the neurons fire and pass signals to each other. The aforementioned replay of thoughts along probable neuronal paths, run in our unconsciousness, visiting thoughts, memories and ideas, and maybe they are related to some unresolved issues or problems you have been focusing on lately. This resembles divergent conscious thinking, but recent studies indicate that the unconscious is much more powerful than the conscious mind as a provider of novel thoughts and ideas. There are several properties that make the unconscious so effective for divergent thinking, for instance:

Parallel processes As discussed above under dialogue, parallel neuronal processes may cover a state space of thought faster than a single (conscious) process. Since the unconscious mind is not restricted to give attention to a single thought process, like the conscious mind is, it is reasonable to believe that there are multiple parallel cognitive processes running beneath the surface of awareness. The psychologist and Nobel laureate David Kahnman gives support to this in his book Thinking Fast and Slow where he describes the fast thinking, automated, and unconscious ‘system 1’ in the brain as a parallel processing system. The slow thinking and conscious ‘system 2’, he states, is a serial system. This multivariate thinking process of the unconscious mind may be highly beneficial if it is coupled with some information integration system. We will return to the latter aspect later.
Reduced bias As I will return to in the next blog post, our biases may direct influence on the random walk process of divergent, conscious thinking, but it is reasonable to believe that biases are less influential on unconscious thoughts. I guess most of us have experienced divergent thinking during dream sleep which is far beyond what we perhaps would allow ourselves when we are awake. The famous psychiatrist and psychologist, Milton Erickson, who specialized in medical hypnosis, said that biases are the province of the conscious mind, and that the unconscious level of awareness is characterized by the absence of the influence of biases. Further, he declared that “the unconscious typically involves a more objective and less distorted awareness of reality than the conscious”.
Spontaneous firing As mentioned above under the focus level property of stochastic processes, spontaneous firing may cause longer jumps between conscious thoughts, especially when we lower our focus level. For the unconscious brain, it is reasonable to believe that spontaneous firing may be an even richer source to divergent thinking, since the unconscious is likely to be less affected by the focus level and the biases of the conscious mind.
Increased speed Experiments with rodents have shown that the speed of the replay of memories during sleep increases multiple times. It appears that the unconscious thought processes may be accelerated in time during so-called sharp-wave ripples going on in the hippocampus (Liu et al,,2019). These are high frequent bursts of waves travelling through the neural network of the hippocampus and cortical regions. Ivry (1997) discusses the role of the cerebellum as a timing device which may control the speed of the replay. This small, but complex brain region is known to play an important role in controlling and coordinating bodily motor skills, but recent research points to the possibility that the cerebellum also plays an important role in creative processes. I will return to this in the next blog post.
Reshuffling and randomization A recent study by Stella et al. (2019) on rats indicates that thoughts and memories are not only replayed during sleep, but sequences of memories may be randomized during replay. Liu et al (2019) also found that humans unconsciously replay, reshuffle and reorganize experiences. This randomized replay may be an important source to unconscious divergent thinking where new combinations of thoughts and ideas are tested.

So we might say that the brain is running an open and objective dialogue, or even a multilogue, with itself below the surface of consciousness. Probably this also is as close as we can get to an internal Bohm Dialogue. The philosopher and scientist David Bohm was a strong advocate for the open dialogue as a playground for creativity. In this way your unconscious brain is a highly advanced and multivariate statistical processor that may provide a wide variety of candidate thoughts and ideas through recombination and reshuffling of learned sequences, and at high speed. The only drawback is that most of this dialogue happens beyond our reach, in the unconscious. Nevertheless, every now and then this powerful statistical machinery serves some insight, an eureka moment, to the surface of attention. This is the imaginative insight described by Bohm. However, in order to do so, the unconscious can not only bring up candidate ideas, the unconscious must also evaluate, at least to some extent, the usefulness of these candidates before any insight unfolds before the mind’s eye. It is here that our biases, internal models and structures come into play.

The stochastic process properties discussed here are highly relevant for implementing virtual association processes on the path towards AC. Inspiration from human divergent thinking, both conscious (serial) and unconscious (parallel) could lead to efficient and fast idea generation. With regard to the conjecture from the Dartmouth proposal stating that human creativity should be possible to imitate in silico, this is perhaps the easy part. Computers are really good at providing random (or pseudo-random at least) numbers as a basis for divergent stochastic processes. The tricky part of creativity simulation will be covered in the next blog post where we will look into the important role of biases, how they work, are created, maintained and often dissolved throughout lifelong learning, and, of course, their important role in creative processes.

The post The stochastics of divergent thinking appeared first on Metacognition.

Insight and fancy – The legacy of David Bohm

Solve Sæbø — Thu, 20 May 2021 18:45:16 +0000

(This is the second blog post in a series on human and artificial creativity. You can read the first post here.)

The “founders” of AI, John McCarthy and colleagues, described creativity in terms of “…randomness…guided by intuition…” in their proposal for the Dartmouth summer seminar on Artificial Intelligence in 1956. Their conjecture was that the creative process had to possess some kind of controlled randomness. This description captures two important and opposing aspects of creativity which I will return to in these blog posts on my way to explore Artificial Creativity, or AC.

Many artists, writers, composers, scientists and others have experienced the randomness of creativity, how inspiration comes and goes. Although this random nature of creativity is hard to understand, many have discovered, perhaps somewhat accidentally, that special conditions or external factors seem to trigger creativity. Take for example Pablo Picasso who found inspiration in his muse, Marie-Thérèse Walter, or Albert Einstein who discovered that campus park walks made his thoughts wander, often bringing new ideas to follow up in the office. Hence, there appears to be ways to stimulate, nurture or even control creativity despite its random nature, but still it is a bit like magic, sometimes it is there, sometimes it’s lost.

And even more magic is the so-called eureka moment, the mysterious flash of enlightenment, named after Arcimedes’ legendary moment of insight in the bathtub. This is the sudden insight, seemingly appearing out of nowhere, where an imaginative picture of the solution to a long nagging problem is flashed before the mind’s eye.

Study after study have been conducted in order to unravel the secrets of creativity, to describe the circumstances when it is at play and to figure out whether people have different potentials when it comes to creativity. Various tests, like the Torrance Tests of Creative Thinking, have been developed to measure factors that are believed to correlate with creativity, like the level of divergent thinking, or better known as the ability to think outside the box.

Scientist of modern neuroscience and psychology have been using brain scans like EEG or fMRi to shed light on the processes behind creativity and the eureka moment and which brain regions and neural networks that are involved, but we still lack a deep understanding of the hidden cognitive processes behind creativity in general, as well as the mysterious eureka moment.

Perhaps statistical theory may be a way forward to unravel some secrets of creativity? After all, the creative thought with its random appearance is obviously the result of stochastic (random) cognitive processes in the brain, and stochastic processes are well studied in the field of statistics, as part of stochastics theory. Statistics is the tool for dealing with and understanding random phenomena! Conscious and unconscious, associative processes appear to be at the core of creativity in general, and even of the eureka moment. Well known properties of stochastic processes and statistical theory may be useful to understand the processes of creativity, especially if combined with knowledge from neuroscience and a portion of philosophy. Furthermore, a statistical understanding of creativity can even give insight into whether AC might become the next leap of Artificial Intelligence (AI).

However, we cannot discuss AC without some kind of initial understanding of the phenomenon of human creativity. For instance, how is human creativity defined; in arts, in philosophy, in psychology or even in science? As it appears, the general understanding of creativity is quite overlapping in all these areas. Much has been written about the nature of creativity, but from a philosophical point of view, the thinking of the physicist and philosopher David Bohm is perhaps still among the most insightful reflections made. His views are also starting to find support in modern neuroscience.

David Bohm

As a scientist Bohm was well known with the scientific methods, and to him artistic creativity and scientific progress were tightly connected. Just as science typically proceeds through the two stages of 1) hypothesis generation, and 2) deductive reasoning and experimental testing, Bohm argued that a creative process also contains two main stages, and he referred to them as 1) insight, and 2) fancy. Bohm divided each of these further into two subtypes, the imaginative and the rational insight or fancy.

«The works must be conceived with fire in the soul but executed with clinical coolness.»

Joan Miró

This quote by the spanish painter Miró expresses exactly these two parts of the creative process. My interpretation of Bohm is that the first stage, insight, describes how novel ideas or insights develop through an open process of divergent thinking, either consciously (rational insight) or unconsciously (imaginative insight), where the latter is what we may call an Eureka moment. The second stage of fancy describes a more deductive process, where insights are explored and hammered out into matured ideas, innovative products or theories, either by semi-conscious unfolding (imaginative fancy) or through fully conscious and focused reasoning, prediction and evaluation (rational fancy). As Einstein in a moment of imaginative insight had seen the higher order that could join the pieces of theory, a new phase of creativity came into play, namely deducing the impact of the new insight on physics in general. The process of hammering out the theories and predictions following his new insight took focused attention and collaboration.

The stages of insight and fancy of Bohm describes different stages in a creative process, but does not define creativity per se. What separates a creative thought from any other thought? In psychology a common understanding of creativity is simply that: creativity is novelty that works. That is, a creative act or idea must, of course, represent a certain level of novelty, but everything new is not necessarily creative! There must be something more. Somehow it has to work! In science it must withstand empirical tests and expand knowledge within a field of research, in art it must produce some engaging response in the spectator, and in music it gives the listener the chills, a good feeling of harmony, or perhaps dancing feet! From this we can say that creativity is exploration of the unknown in order to learn something new and useful. A «novelty that works» should contribute to expanding our horizons.

Human creativity may thus be understood as a two-stage process. First an open, divergent mind-wandering into the unknown, which often is semi- or unconscious, and which generates hypotheses about new structure or order. The steps into the unknown may be careful or bold, short or long. There are no limiting rules at this stage. Your imagination sets the limits. However, the usefulness of the new idea or order is evaluated at the second stage, during which one may ask: What does it imply? Does it work? Does it predict the future? Is it rewarding? If so, the new idea may be accepted, just as a scientific hypothesis is retained unless deduced predictions do not hold up against empirical evidence. As articulated by Joan Miró, this stage may require clinical coolness, a clear and focused mind to refine and evaluate the new idéa or hypothesis. Typically, short steps into novelty are more likely to be accepted than long. Just as flickering on an existing scientific theory is more likely to pass a review process than a call for a new paradigm. However, it’s the latter which in retrospect stands out as truly ingenious and creative. Hence, there appears to be different levels or types of creativity. This is a topic I will come back to in a later blog post.

David Bohm also describes creativity as a process that brings order and structure into chaos, like Einstein who introduced his new order in physics with his theories of relativity. Throughout life we learn how to handle randomness and variability that exists in our environment, but the creative way of discovering order or structure is not the way we mostly experience learning in school, namely by supervision, where a teacher brings order into chaos by putting things into a structure for us. On the contrary, creativity is rather an example of unsupervised learning. In statistics unsupervised learning means that a ‘truth’, often called the gold standard, is unknown. In the absence of a gold standard chaos must be boldly explored and new order must be discovered, by ourselves.

However, without any guidance with regard to the nature of the unknown order, the search is likely futile. Some assumptions must be made about the properties of the unknown. Just as unsupervised learning in statistics relies on assumptions like the probability distribution of data, the level of entropy (order and disorder) or the number of unknown categories to be found. John McCarthy pointed at intuition as one guide for creativity, although this leaves us the open question of ‘what is intuition?’. David Bohm on his side states that scientists, and artists alike, seek a sense of wholeness, harmony or beauty in their work.

[…about creativity]…, I suggest that there is a perception of a new basic order that is potentially significant in a broad and rich field. The new order leads eventually to the creation of new structures having the qualities of harmony and totality, and therefore the feeling of beauty.

David Bohm, in On creativity

I think Bohm and McCarthy both point to similar control mechanisms that are involved in order to limit the stochasticity of the divergent thought during the stages of the creative process, namely our personal or common notions of what feels right, what is normal, what is within ethical standards, or relevant and interesting, or, say, according to common sense or being commonly acceptable. Our control functions are our biases, inherited or learned, narrow or wide, conscious or unconscious. These are our learned abstractions and our internal models of how the world is and how it will evolve. It is the hierarchical structures of biases we use to deem some new idea as useful or not. They represent our frames of reference, helping us to bring order into chaos, to survive…., and to be creative.

To sum up we might say that creativity is a process that fluctuates between order and chaos, between searching for novelty and evaluating usefulness, between randomness and control. Bohm points out that the creative process is a continuous iterative process moving back and forth between stages of insight and fancy, and to tie it all together, each stage of insight and fancy involves different levels of randomness and control. Imaginative or rational insight is characterized by high stochasticity and low control, whereas the opposite is the case for imaginative or rational fancy. In the next blog posts of this series I will use the framework of statistics to investigate controlled randomness of creativity, as John McCarthy put it, back in 1955.

The post Insight and fancy – The legacy of David Bohm appeared first on Metacognition.

Will Apple Siri ever shout Eureka?

Solve Sæbø — Mon, 17 May 2021 20:08:34 +0000

Edmond de Belamy

Today’s smartphones surely can accomplish a lot, but is creativity within the reach of Artificial Intelligence (AI)? Or, are the acts of creativity, like making a painting that profoundly speaks to an audience, only a feature of human cognition and intelligence? Take a look at this painting. You may be surprised to learn that it was created by AI. The painting is called Edmond de Belamy and was in 2018 the first painting created by AI to be featured in a Christie’s auction. Is this painting, along with similar AI generated images, like the ones from Google’s Deep Dream Generator or the Crypto AI art by Montreal.AI, tokens of creativity? And what about the ingenious moves made by Google Deep Mind’s chess machine AlphaZero? Should they also be considered as examples of artificial creativity? The debated question of the existence of artificial creativity often leads to a discussion about the properties of human creativity. Could human creativity be replicated in AI, or at least imitated? Or do we need another definition of creativity to describe seemingly innovative actions made by AI? These are topics I will cover in a short blog series this summer.

During the last ten years AI has come to everyone’s attention. As a research field, however, AI is much older, and the term Artificial Intelligence was in fact invented as early as in 1955 by the mathematician John McCarthy and co-workers as they planned a research project at Dartmouth College for the summer 1956 . In the proposal for funding of the summer school it is written:

The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Among the aspects of learning and intelligence discussed in the Dartmouth proposal were the topics of language processing, artificial neural nets and abstraction. All these are central aspects of AI research and development today. However, McCarthy & co also mentioned another important aspect of human intelligence, which during the 65 years since the summer seminar in Dartmouth has earned less attention in the AI community, namely creativity. Creative problem solving is often regarded as an important characteristic of human intelligence. In the proposal it is also stated:

A fairly attractive and yet clearly incomplete conjecture is that the difference between creative thinking and unimaginative competent thinking lies in the injection of some randomness. The randomness must be guided by intuition to be efficient. In other words, the educated guess or the hunch include controlled randomness in otherwise orderly thinking.

Obviously McCarthy and his colleagues had a hunch themselves as to how to address Artificial Creativity (AC) and planned to solve these matters in no-time, more or less having a computer program ready by the end of the two month seminar in 1956. It turned out to be a bit more difficult. Today, almost seven decades later, we may ask how far we’ve come to reach McCarthy’s goal of creating AC? Are we anywhere close, or are we still far off?

Before we can explore the present status and the future potentials of AC, we need a better understanding of human creativity. Only then can we approach an answer to the question of whether the human creative process can be replicated, or at least simulated in computers, at all. Alternatively we may perhaps end up concluding that computers can indeed show creative behaviour, but where it is necessary to define ‘creativity’ differently for computers than for humans, similarly to how ‘intelligence’ is quite differently understood when it comes to human and artificial intelligence.

In a short series of blog posts appearing this summer I will dig into these matters using the framework of statistics as a tool for understanding the controlled randomness of creativity. This framework will also provide a natural basis for discussing the prospects of artificial creativity. In the next post, Insight and Fancy – the legacy of David Bohm, to appear soon, I will start exploring human creativity using the insights of the physicist and philosopher David Bohm. The series will continue with the post The stochastics of divergent thinking, followed by Bending, blending and breaking biases. In the final blog post On artificial creativity I will return to the question that I started this introductory blog post asking, Will Apple Siri ever shout Eureka?. There I will explore the prospects of AC in light of the statistical view of human creativity established in the previous blog post.

I hope you will enjoy the series!

The post Will Apple Siri ever shout Eureka? appeared first on Metacognition.

How to make AI more sustainable

Solve Sæbø — Wed, 10 Mar 2021 21:39:11 +0000

(Long read ~3750 words, best enjoyed with a cup of fairtrade coffee or tea. This text was originally written in Norwegian, translated to English by means of unsustainable Artificial Intelligence, and modified by sustainable Human Intelligence)

The progress we have seen in artificial intelligence (AI) over the last decade has been impressive, but it is not sustainable, neither in an environmental, social or economical sense. It is still possible to adjust the path, because recent research towards more human inspired AI indicates that AI can become both more sustainable as well as more efficient. However, more human-like AI also brings ethical challenges that society must address, today.

Artificial intelligence (AI) has come to everyone’s attention in recent years. Although the concept and the methodological development have been with us for almost 70 years, it was not until about ten years ago that AI made the biggest leap forward in terms of utility and usability. This was primarily a result of improved algorithms, growing computing power and large amounts of readily available data. At the very least, the big tech companies like Google, Facebook and Amazon have been able to take full advantage of this. Since then, the development has mainly been about increasing computing capacity, increasing the amount of data used in the training of the models and expanding the use to new areas.

This has given us impressive examples of what AI can do in very specific situations. It can beat world champions in board games such as chess and GO, it can to some extent take control of our cars, and recently Google Deep Mind was able to present a model, AlphaFold, which with relatively high accuracy can predict the 3-dimensional structure of proteins, based only on the amino acid sequence of the protein. The latter example may be difficult to grasp the potential off for most of us, but it can potentially be a fantastic tool in biotechnological and medical research.

These are examples of so-called narrow AI. The trained models are apparently superior to humans, but only in very specific areas. If the Google Deep Mind’s “machine” AlphaZero, which is trained to play conventional chess, was challenged in a game on an alternate board with fewer or more squares, it would fail completely. To master the new game, AlphaZero had to be retrained from scratch. It can be said that AI can be super-intelligent in specific tasks, but useless in all other situations. Nevertheless, there is no doubt that AI will change our everyday lives to a greater extent in the time ahead by precisely streamlining automated and limited tasks.

AI and environmental and economic sustainability

Training the so-called deep learning algorithms that underlie AlphaZero and similar models is, however, very energy-intensive, and also costly and with negative environmental impact. Two years ago, researcher Emma Strubell and colleagues at the University of Massachusetts calculated the carbon footprint of one training session of such a model to be 284 tonnes of CO2, that is, five times the lifetime emission of a car. The fact that the trained models are static and therefore have to be run again every time they are to be adjusted or updated, and that models are constantly being developed for new tasks, of course means that the total cost and environmental impact is multiplied.

Moreover, since Strubell & co made their calculations in 2019, at that time on models with about 200 million parameters, requiring 32,000 hours of training, the AI models have grown in complexity and depth, and the number of parameters is now up to over one billion for Google’s latest language model! We’re now talking about weeks and months to train the model and the carbon footprint increases correspondingly. In addition, there is a significant environmental burden related to the storage of big data across the world. A calculation from 2016 showed that the world’s data centers at that time consumed as much energy as the entire aviation industry.

In 2020, a group of researchers at MIT published an analysis based on 1000 scientific papers that showed that the progress (measured as prediction accuracy) for AI models in recent years mainly is explained by an increase in computing power. They conclude that a similar development in the coming years will neither be economically or environmentally sustainable, nor technologically possible with current AI algorithms. Their conclusion is that new advances in AI will depend on either more effective methods, or switching to other methods in statistics or machine learning. More efficient computers like quantum machines or massively parallel machines with architecture inspired by the human brain, may also reduce training time, but the calculation costs are currently unknown for such unrealized future machines.

AI and social sustainability

Even from a social perspective, the development in artificial intelligence is not sustainable. Perhaps the biggest challenge in achieving the sustainability goals by 2030 lies in goal no. 10 (reducing inequality) and its interaction with the other goals. The UN has launched the concept of “leave no one behind”, which specifically addresses how to work with the sustainability goals in light of the fact that everyone should have equal opportunities.

Technology development is an area where this is particularly relevant. There is a great danger that the technological gap, both in and between countries in the world, will increase if one is not aware of this. AI is unfortunately an example where technological development so far seems to contribute to increased inequality. As we have seen above, R&D activities in artificial intelligence require such enormous resources and amounts of data that only resourceful countries and the big tech companies can afford it.

Discrimination and prejudice have also proved to be a problem. Many AI models are trained on enormous amounts of observational data, often collected from the internet. In addition to useful and relevant information, the data can hide unintended biases between groups (gender, ethnicity, etc.) or even unfortunate/immoral tendencies expressed by groupings in society. Biases and prejudices can then be adopted by the models and thus reinforce the biases or prejudices by their use.

This problem is due to the fact that the methods are mainly based on “bottom-up” learning, that is, learning only from observational data and not taking instructions “from above”, so to speak. In addition, the trained models are not transparent (black box models), which makes it difficult to detect “bad habits” picked up from big data. The latest Facebook algorithm, Seer (SElf-supERvised), is an impressive example of how AI can teach itself to categorize objects from just watching pictures on Instagram with minimal human intervention. This billion-parameter model, which requires “gargantuan amounts of data», is literally left to teach itself. Although the potential for improved performance in object classification is big, the undesirable bias problem may also become a serious issue, unless better ways to control biases are developed.

The bias problem is addressed in international strategies (e.g. EU, OECD) for artificial intelligence. Actions to reduce biases and discrimination are called for, like data cleaning, where attempts are made to wash or balance the data in advance to overcome the problem. A catch here is that for the many-dimensional problems that the AI models often address, for example in language processing or in Facebook’s Seer algorithm, it is not possible to clean the data in all thinkable ways. The pursuit of unbiased AI is therefore basically futile, although one should of course seek to minimize the problem. This means that we must be able to cope with the fact that AI will be biased to a greater or lesser extent also in the future.

How can AI become more sustainable?

If AI is to follow a more sustainable path towards greater performance and application, we must solve the challenges outlined above. The question is how new advances in AI methodology and technology can be compatible with changing course towards a more sustainable AI.

What we need is AI that is energy efficient, that learns quickly, preferably from little data. It should also be dynamically learning. We should to a large extent be able to manage biases and ethical/moral aspects, and we should also know the uncertainty associated with the decisions that AI makes for us.

Where do we start?

Well, probably a fruitful strategy is to seek inspiration in the human brain, because between our ears we have a deep learning “machine” with all these properties. Ever since the beginning of the development of AI, the human brain has been an inspiration for developers, albeit to varying degrees. We now see that researchers in AI and neuroscience seek together to learn from each other (e.g. Zador Lab). Neuroscience can learn about the brain from the properties of AI models, and AI developers draw inspiration from neuroscience in the pursuit of what is called artificial general intelligence. This is AI which in principle can solve a wide range of intellectual problems, as humans can, that is, not just play chess, translate texts or fold proteins. But in this context, an equally important point is that this development can also be the path to a more sustainable AI.

Four principles of human learning to make AI more sustainable

There are several principles of the human learning that have, to only a limited extent, influenced development within AI. We’ll look at four of them here and relate them to recent research within artificial intelligence.

1) Top-down control. As humans we learn through an interplay between bottom-up sensing and top-down control. The focus within AI research has mainly been on bottom-up learning for the last 40 years. As humans we have an enormous ability to utilize previously acquired knowledge in new situations. Learned patterns, categories and concepts are adapted to new situations so that we can learn from very few instances, whereas AI as of today mostly has to be trained from scratch for each new problem. For example, when we first got purple carrots in the grocery stores a few years ago, most of us were a bit perplexed for a moment, but due to our knowledge of orange carrots, just one single observation of purple carrots was enough to expand our mental category of “carrots”. In this way learned and re-used concepts make humans into incredible fast learners. In fact, some concepts (reflexes, instincts) appear even to be “programmed” from birth. The sucking reflex in babies is one example, while in the animal world we find many examples of instincts that contribute to animal survival.

Relevant research. AI, as it is typically is constructed today, is predominantly bottom-up learners and would need a large number of examples to learn a new concept. However, a recently published article by Rule and Riesenhuber shows the potential for top-down learning in AI, as well. By using previously learned categories in image recognition, while learning new similar categories, e.g. recognizing car wheels by first learning to recognize bicycle wheels, they could point to effective learning from training with a significant reduction in the number of necessary observations.

Sustainability aspects: Herein lies an enormous potential for fast and efficient learning, and thereby a reduction in the need of costly and energy demanding computations. Furthermore, the availability of AI would depend less on access to big data.

2) Social transfer of concepts. Perhaps the most important key to human success is that we are social and able to learn concepts and ideas from each other. In fact, we do not even have to experience it ourselves! Many skills, such as learning to ride a bike, do happen through a lot of trial and error, and learning takes place more or less like a robot would do today. However, when it comes to questions of right and wrong, morality, ethics and values, we mostly learn in a more top-down way, and from others. When a mother or father tells their child that it is wrong to steal, the child usually accepts it, perhaps after answering some “why?” questions, but without testing it out repeatedly at the local mall. This means that there should be opportunities for controlling the ethical and moral aspects of AI by controlling the biases in a top-down manner rather than just leaving it to bottom-up learning from cleaned or balanced data.

Relevant research. Although it seems like much research still focuses on a bottom-up approach to AI, there is a growing focus now on how to get machines to obtain deep understanding and possess a kind of common sense. More research remains to find out how such a transfer of higher-order concepts can be effectively done in deep learning AI, but a key here probably lies in human language that can be used to represent and convey general thoughts, ideas and opinions. Research in what is called Natural language processing (NLP) linked to transfer learning and representative learning is becoming a central field in the search for artificial general intelligence.

Sustainability aspects: Transfer learning increases the sustainability benefits of top-down learning by adopting concepts and categories from other “learners”.

3) Dynamic learners. The brain is plastic, that is, changeable through learning. Thus, human learning is a dynamic, continuous process in which our learned concepts or biases (opinions, perceptions, interests, prejudices, knowledge), are continuously adjusted based on what we sense and experience. In statistics this process would fit the description of a Bayesian filter, where prior assumptions turn into posterior beliefs in light of experience. In this way we become increasingly wiser, hopefully, at the same time as our individuality and personality are strengthened throughout life.

Relevant research. Principles for dynamic learning, which has proven effective for learning in humans, has not been explored to any great extent within deep learning contexts in AI. A recent exception is a research group that constructed a deep learning model with a defined structure inspired by the neural system of the nematode C. elegans with its 302 neurons and 7000 connections. They describe the method as a liquid neural network because it has the property that it can learn continuously by strengthening/weakening connections as a result of experience and can adapt to a changing environment.

Sustainability aspects: With dynamic, continuous learning, it is reasonable that the necessity of re-training models for updates or adjustments could be reduced, and again, here lies a huge potential for a reduced carbon footprint of AI. In addition, availability to AI would increase by providing access to previously trained models as templates for further development through dynamic learning.

4) Confidence assessment. Predictive coding is a central theory in cognitive brain research that states that through life and based on experience, we build mental models that are used to predict what will happen next, or what we will sense or experience in the immediate future. The discrepancy between what is experienced and what was predicted is used to adjust our internal models, if the experience is found credible and we trust our senses. The theory can also explain how strong internal models can twist sensory impressions and lead to illusions or misconceptions, if we have little confidence in or doubt what we observe. As human beings, we thus have a system that links uncertainty assessments to our perceptions and judgements.

Relevant research. This is more or less absent in deep learning algorithms, but can give important information about the uncertainty in the decisions that AI makes for us. If there is great uncertainty, more information should be gathered, if possible, for example to make the most responsible choices. Here we see the need for statistical considerations to be included in the methods (again). Uncertainty analysis for artificial neural networks is not new, but have proven to give heavy and time-consuming calculations, and new, fast methods are needed for increased sustainability. More research is needed here on how the brain links so-called confidence to its assessments, but a research group at MIT/Stanford has recently published a promising and rapid method called deep evidential regression that can lead to safer and more robust AI systems in the future, for example for self-driving cars or in medical diagnostics.

Sustainability aspects: Confidence information may increase the trust in AI models and their predictions and potentially prevent unfortunate consequences, like instances of discrimination, prejudice or even fatal accidents. Confidence assessments can potentially also limit the need of gathering more data if data anyway are noisy and do not increase the confidence in the model predictions.

Challenges and opportunities with human inspired AI

Ethical principles, reliability, transparency, explainability and accountability are emphasized in international strategies for AI as important aspects that must be safeguarded. It is very important that guidelines and legislation are at the forefront of technological development. A development towards a more human inspired AI is apparently inevitable, and as discussed above, also desirable from a sustainability perspective, but at the same time it increases the importance of regulation.

As mentioned earlier, it is more or less impossible to create AI that is completely unbiased, and just as it is natural for people actually to be biased, it will also be so for AI with built-in top-down control, because it is precisely the biases that define the overall control function. But here also lies an opportunity for more transparent, overarching and integrated handling of ethical and moral principles. Eventually, it may be possible, as part of a dynamically learning model, to instruct AI at a higher level through the use of appropriate a priori representations like “it is wrong to steal”, or “women and men should be evaluated equally in recruitment processes”. Or to quote John Connor in the movie Terminator 2 in which he morally instructs the terminator played by Arnold Schwarzenegger: “You can not just go around killing people!”.

The challenge is, of course, the same for a top-down moralization as the one we face if it is to be given through cleaned data; whose moral standards and ethics should apply? What is perceived as right and wrong is both time and place dependent. Here, the world community must agree upon wat is ethical and morally acceptible. Furthermore, it points to the importance of AI being developed in interdisciplinary environments composed of expertise not only within AI, statistics and technology, but also from social science areas, such as law and philosophy.

On the horizon of fast and dynamically learning AI, there is another issue that needs attention, namely that learning of a machine or robot will depend on its experiences, and individual machines could develop in different ways and in different directions as they «grow up». It is not inconceivable that in a few years, robots will be referred to and treated more as individuals for this reason. This opens up new and demanding questions about responsibility for the actions of machines. Already today, questions are being asked about who is responsible for traffic accidents involving self-driving cars; the driver or the car manufacturer? This problem will not be easier if the car’s built-in AI system has become accustomed to new habits, perhaps even from the driver, through many hours in traffic since it rolled out of the factory. Such individualization can easily trigger our imagination and be an inspiration to Science Fiction literature, but future scenarios, where talking about “schooling” rather than “training” of artificially intelligent robots, is not unlikely. In fact, Facebook’s Seer algorithm is already well seated on the school bench. Soon we may even talk about robots with different types of personality. For example, a robot with clearer top-down control than bottom-up influence from data, is likely to score high on the personality variable Openness to experience in the widely used Big five inventory in personality psychology. The psychoanalyst Carl Jung would probably type this robot’s personality as intuitive.

Such scenarios as AI with personality, can seem both distant and perhaps frightening. Many fear what the future will bring in this field. Will robots eventually develop consciousness and reach the point where they surpass human general intelligence and become a threat to humanity? Experts in consciousness research mainly agree that this is not a likely scenario. The human component in this equation is probably much more important to control, because it is AI in the hands of humans that can go wrong if we are not precautionary and agree on how the future AI should be regulated. It will be necessary to develop both ethical, responsible, robust , transparent AND sustainable AI.

A broader strategy for sustainable AI

As we have seen examples of, there are various research groups around the world that work with some of the issues that are highlighted here. Also here in Norway more energy-efficient algorithms, explainable AI, human-inspired AI and moral/ethical principles for AI are areas of research, for instance within the NORA network. Focus on sustainability is also increasing, but mainly within applications of AI to strenghten the progress towards achieving many SDGs. It is positive that Norway in the National Strategy for Artificial Intelligence aims to take the lead in the work of producing ethical aspects of AI. The strategy states (my translation): “Norwegian society is characterized by trust and respect for fundamental values such as human rights and privacy. The government wants Norway to take the lead in the development and use of artificial intelligence with respect for the individual’s rights and freedoms.” Here, Norway could have taken the lead and even clearer responsibility for global, social sustainability and fronted the “leave no-one behind” vision on the technological side. Also when it comes to the economic and environmental aspects of AI, Norway could aim to lead in a more sustainable direction, but then we must have a clear idea of what is the right direction. We have seen that the environmental consequences associated with energy use and data storage are growing, and a further investment in computational power and data storage does not seem to be sustainable, even though this is precisely where the last ten years’ success in narrow AI has come. In this article, we have pointed out that investing in a more human inspired AI is not only a sensible path towards artificial general intelligence, but also to sustainable AI addressing the entire Agenda 2030.

The post How to make AI more sustainable appeared first on Metacognition.

Next generation AI

Solve Sæbø — Thu, 28 Sep 2017 08:45:12 +0000

Summary: AI networks may become both faster learners and potentially have moral standards by introducing internal reward systems. Recent scientific results indicate a shift in AI. Do we see a new generation of AI?

Artificial intelligence, or AI for short, is all around. Google, Microsoft, IBM, Tesla and Facebook, they are all been doing it for a long time, and this is just the start of it. The Russian president Vladimir Putin recently said that those in the forefront of AI will rule the world, whereas others like Elon Musk and Bill Gates raise concerns regarding the dangers of AI and the creation of super human intelligence.

Where are we heading? Are we in charge, or is the process already beyond control?

One thing is for sure, the speed of development of AI has sky-rocketed since the first attempts were made to mimic human brain learning with simple artificial neural networks several decades ago. Much of the theory behind neural networks has not changed since then, although some algorithmic improvements have come about, like reinforcement learning and convolutional networks, but the real breakthrough of AI in later years has been made possible by brute force, through big data and increased computational power.

Still, the artificial learning algorithms are not as efficient as the natural learning processes of the human brain, yet. Humans are in some respect much more efficient at learning than the computers, although computers may digest much larger quantities of data than us per time unit. We can extract the essence of information (that is, to learn) from only a few repeated examples, whereas a computer may need thousands of input examples in comparison. In some circumstances we may in fact need only a single experience to learn about, for instance, a life threatening danger.

It is no question that the learning algorithms used in AI are computationally heavy and quite inefficient. The AI pioneer Geoffrey Hinton recently expressed he is “deeply suspicious” of the back-propagation step involved in the training of neural networks, and he calls for a new path to AI. Hence, new inspiration is needed to make the algorithms more efficient, and what is then more natural than to turn to the natural neural networks of our own brains for this inspiration?

But faster and more efficient learning does not calm the nerves of those who fear the super human intelligence, on the contrary! How can we be more confident that artificial intelligence will behave according to the moral standards modern and developed societies live by? Also here we should turn to our own brains for inspiration, because after all, humans are capable of thinking and behaving morally, even if the news are filled with counter examples every day. We may still hope that it will be possible to create AI with super human moral standards as well as intelligence!

Geoffrey Hinton is probably right, we need a new path to AI. We need a next generation AI!

Derivative of image by Filosofias filosoficas, Licensed by Creative Commons

The next generation AI must learn more efficiently and be more human-like in the way it acts according to values and ethical standards set by us.

Three fairly recent scientific findings in AI research and neuroscience may together reveal how next generation AI must be developed.

The first important result is found within the theory of “information bottlenecks“ for deep learning networks by Naftali Tishby and co-workers at the Hebrew University of Jerusalem.
The second result is the new curiosity driven learning algorithm developed by Pulkit Agrawal and co-workers at the Berkeley Artificial Intelligence Research Lab.
And finally, a fresh off the shelf paper by John Henderson and colleagues at the Center for Mind and Brain at the University of California, Davis, shows how visual attention is guided by the internal and subjective evaluation of meaning.

These three results all point, directly or indirectly, to a missing dimension in AI today, namely a top-down control system where higher levels of abstraction actively influence how input signals are filtered and perceived. Today’s AI algorithms are dominantly bottom-up in the way input signals are trained from the bottom and up to learn given output categories. The back propagation step of deep learning networks is in that sense no top-down control system since the adjustments of weights in the network has only one main purpose, to maximize an extrinsic reward function. By extrinsic we mean the outer cumulative reward that the system has been set to maximize during learning.

The fundamental change in AI must come with the shift to applying intrinsic as well as extrinsic reward functions in AI.

Let’s begin with the information bottleneck theory to shed light on this postulate. In a Youtube video Naftali Tishby explains how the information bottleneck theory reveals previously hidden properties of deep learning networks. Deep neural networks have been considered as “black boxes” in the way they are self-learning and difficult to understand from the outside, but the new theory and experiments reveal that learning in a deep network typically has two phases.

First there is a learning phase were the first layers of the network try to encode virtually everything about the input data, including irrelevant noise and spurious correlations.
Then there is a compression phase, as deep learning kicks in, where the deeper layers start to compress information into (approximate minimal) sufficient statistics that are as optimal as possible with regard to prediction accuracy for the output categories.

The latter phase may also be considered as a forgetting phase where irrelevant variation is forgotten, retaining only representative and relevant “archetypes” (as Carl Jung would have referred to them).

We may learn a lot about how the human brain works from this, but still, as mentioned above with regard to efficacy of learning, the compression phase appears to kick in much earlier in natural neural networks than in the artificial ones. Humans seem to be better at extracting archetypes. How can this be?

I believe that the information bottleneck properties observed in artificial deep learning networks describe quite closely the learning phases of newborn babies. Newborn babies are, like un-trained AI’s more like tabula rasa in the sense that there are no/few intrinsic higher levels of abstractions prior to the learning phase. Babies also need a lot of observations of its mother’s face, her smell, her sound, before the higher abstraction level of “mother” is learned, just like an AI would.

But here the natural and the artificial networks deviate from one another. The baby may carry the newly learned concept of a mother as an intrinsic prior for categorization as it goes on to learn who is the father, that food satisfies hunger, and so on. As the child develops it builds upon an increasing repertoire of prior assumptions, interests, values and motivations. These priors serve as top-down control mechanisms that help the child to cope with random or irrelevant variation to speed up data compression into higher abstraction levels.

My prediction is therefore that compression into archetypal categories, which has been observed in deep learning networks, kicks in much earlier in networks where learning is both a combination of bottom-up extrinsic learning and top-down intrinsic control. Hence, by including priors into AI, learning may become much faster.

The next question is how priors may be implemented as intrinsic control systems in AI. This is where the second results by Pulkit Agrawal et al. comes in as a very important and fundamental shift in AI research. They aimed at constructing a curiosity-driven deep learning algorithm. The important shift here is to train networks to maximize internal or intrinsic rewards rather than extrinsic rewards, which has been common insofar.

Their approach to building self-learning and curious algorithms is to use the familiar statistical concept of prediction error, a measure of surprise or entropy, as an intrinsic motivation system. Put short, the AI agent is rewarded if it manages to seek novelty, that is, unpredictable situations. The idea is that this reward system will motivate curiosity in AI, and their implementation of an AI agent playing the classic game of Super Mario serves as a proof of concept. Read more about this here.

I believe the researchers at Berkeley are onto something very important in order to understand learning in real brains. As I wrote in an earlier blog post, learning is very much about attention, and attention is according to the salience hypothesis assumed to be drawn towards surprise. So this fits well into the work of Agrawal et al. However, in another blog post I also discussed how attention depends on a mix of extrinsic sensation and intrinsic bias. In a statistical framework we would rephrase this to a mix of input data likelihood and prior beliefs into a posterior probability distribution across possible attention points, and that our point of attention is then sampled randomly from this posterior distribution.

The point here is that prediction error, as a drive for learning, also depends on the internal biases.

These biases are the interests, values, and emotions we all possess that guide our attention, not only towards novelty, but towards novelty within a context that we find interesting and relevant and meaningful.

You and I will most likely have different attention points given the same sensory input due to our different interests and values. These biases actually influence how we perceive the world!

My good friend and colleague, the psychologist Dr. Helge Brovold at the National Centre for Science Recruitment in Trondheim, Norway, states this nicely:

“We don’t observe the world as IT IS. We observe the world as WE ARE”

This has now been confirmed in a recent study by Henderson et al. at the Center for Mind and Brain at the University of California, Davis. They show in experiments that visual attention indeed is drawn towards meaning rather than surprise or novelty alone. This is contrary to the salience hypothesis, which has been the dominant view in later years, according to Henderson. Human attention is thus guided by top-down intrinsic bias, an inner motivation guided by meaning, interest, values or feelings.

As Agrawal and his colleagues implemented their intrinsic prediction-error (or entropy) driven learning algorithm for the Super Mario playing agent, they encountered exactly the problem that some sort of top-down bias was needed to avoid the agent to get stuck in a situation facing purely random (and hence unpredictable) noise. Noise is kind of irrelevant novelty and should not attract curiosity and attention. To guide the algorithm away from noise they had to define what was relevant for the learning agent, and they defined as relevant the part of the environment which has the potential to affect the learning agent directly. In our context we can translate this as the relevant or meaningful side of the environment. However, this is only one way to define relevance! It could just as well be moral standards acting as intrinsic motivation for the learning agent.

At this point we may now argue that the inclusion of top-down intrinsic bias in addition to extrinsic reward systems in deep learning may both speed up the learning process as well as open up for AI guided by moral an ethics. Strong ethical prior beliefs may be forced upon the learning network affecting the learning algorithm to compress data fixed around given moral standards.

In my opinion, this is the direction AI must move.

But… there is no such thing as a free lunch…

The introduction of intrinsic motivation and bias comes with a cost. A cost we all know from our own lives. Biases make us subjective.

The more top-down priors are used and the stronger they are, the more biased learning will be. In the extreme case of maximal bias, sensory input will provide no additional learning effect. The agent will be totally stuck in its intrinsic prejudices. I guess we all know examples of people who stubbornly stick to their beliefs despite hard and contradictory, empirical evidence.

However, the fact that human perception in this way tends to be biased by prior beliefs is, on the other hand, an indication that this is indeed how natural learning networks learn…, or don’t learn…

The post Next generation AI appeared first on Metacognition.

The random walk of thought

Solve Sæbø — Sun, 04 Dec 2016 22:07:06 +0000

I’m sure you have experienced the frustration of not recalling a person you meet unexpectedly and who obviously knows you quite well. The name lingers at the tip of your tongue, but is impossible to retrieve from memory, and you struggle not to give away that you are lost. Then the person may say something that gives you a clue, and suddenly associations and connected memories rush through your brain. Small pieces of the puzzle fall into place, and finally you recall the name, just seconds before it starts getting embarrassing. You elegantly and subtly verify that you know who you’re talking to and the crisis has been avoided.

Sounds familiar? I’ve been there! But why was it so hard to look up the information I obviously had saved on my hard disk? The answer is that our minds have no page index or table of contents like a regular book. Thinking is based on the principle of association. The next thought follows the previous, and to recall a memory from the library, we need a cue from which we can walk into the memory by following a path of associations. In statistical terms, thinking is a stochastic process, a ‘random walk’ which literary reflects a random walk of signals between neurons in the brain.

Random walk? Are my thoughts just nonsense balderdash? Of course not. The word ‘random’ has a certain daily language interpretation which is different from the statistical. Random just means that it is not completely deterministic. However, some random outputs from the stochastic process may be much more likely than others. I think a certain cognitive randomness is a necessary condition for the existence of free will, but that is another subject. (For my statistical view on whether we have free will, see my previous post The statistics of free will)

Let’s return to the associations. If I ask you: What is your first association to the word ‘yellow’?

….

Maybe you were thinking of the sun, a banana or perhaps a submarine, whereas ‘car’ would probably be less likely (unless you happen to own a yellow car).

The probabilities of moving to other thoughts from the current are called transition probabilities in statistics. My personal transition probability from yellow to submarine is quite high since I’m old enough to remember the Beatles. After thinking ‘submarine’ my continuing random walk of thought could be: Beatles-Lennon-Shot-Dead. Those were in fact my immediate associations. Your thoughts would probably take another random walk.

Stochastic processes are well studied in statistics, and it may be worthwhile to draw some connections between what we have learned from statistical research and cognitive processes like thinking and conversation. Such a comparison may give us new meta-cognitive perspectives on thinking, conversation, personality and psychopathological conditions like obsessive compulsive disorder (OCD), attention-hyperactive disorder (AHD) and Alzheimer’s Disease (AD). In this blog post I will look at the properties of some specific stochastic processes known as Markov Processes and Hidden Markov Processes in this cognitive context.

Let’s start out by assuming that at any given (and conscious) moment our thoughts are sampled from a fixed repertoire of potential thoughts and memories and that we are not influenced by external factors. We may call the thought repertoire the ‘state space’ of thought. The likelihood of the various cognitive outcomes from this state space depends on our history of experiences, the situation we’re in at the moment (the context), on interests and values, on the focus level (high or low focus) and on our personality traits. But it also depends on the current thought as a primer for the next. These factors define together a distribution of transition probabilities over the state space of thought. And from this distribution we sample what to think next.

Markov processes

The random sampling process of thoughts is very much alike the random sampling of parameters from candidate distribution as used in Markov Chain Monte Carlo (MCMC) estimation in Bayesian statistical inference. By MCMC a random walk process is initiated and sampling is run for a long time in order to estimate the unknown probability distribution from the sampled parameter values. The cognitive translation of this is as follows: By monitoring the random walk of thought of a person for a long time and recording the thoughts, we could get an estimate of the likelihood of all thoughts in the state space. If the random process behaves properly the estimate would be independent of the current thought and context. We might as well get estimates of other general (yet personal) things like the most typical thought and the variability of thoughts (some are more narrow-minded than others). Of course, such thought-monitoring is not possible unless you are monitoring yourself. The thinking processes for any other person is in that respect an example of what in statistics is known as a ‘hidden stochastic processes’. An output from this hidden process is only observed now and then as this person speaks. I will come back to this later.

A Markov process (after the Russian mathematician Andrey Markov) is a stochastic process where the probabilities of entering the next state given the entire history of previous states is just the same as the probability of the next state given only the current state. This is the Markov property. If we assume a cognitive Markov process, this means that the probabilities of my next thoughts only depend on the current thought, not on how I got there from previous thoughts. That is, if I for some reason got to think of the Beatles by another route than via yellow submarines, I would still be likely to think Lennon-Shot-Dead as the next sequence.

Whether cognitive processes satisfy the Markov property is perhaps questionable, but let’s stick to this assumption for simplicity since Markov processes and MCMC methods in statistical inference have many interesting properties which I think are relevant also for thinking, learning and neurological disorders.

So let us have a cognitive look at some of these properties.

Priming – Initial value dependence

In order to set up a Markov process an initial value must be given. This initial value is the ‘primer’ or anchor for the next thought. The effect of priming is well known and studied through many psychological studies. Priming describes how thoughts and decisions may be biased towards input cues. The cues may be more or less random or given deliberately to manipulate the cognitive response of the receiver of the cues. Priming is a widely used technique in commercial marketing where subtle messages are given to bias our opinions about products to increase the likelihood of us buying them, and social media marketing is now giving highly personal primers based on the information we provide online. In teaching such priming of thoughts through so-called flagging of headlines is a recommended trick to prepare the minds of the listeners before serving the details. For Markov processes the chain will forget its initial value after some time, and the effect of priming in psychology is similarly of limited time effect.

Focus level – Random walk step length and mixing level

For random walk processes the center of the distribution is typically the current value, but another important factor is the step length or variance of the distribution. If step lengths are short, the process moves very slowly across the state space only entering closely connected states. Furthermore, the series of visited states will show high level of auto-correlation, which in the cognitive setting means that thoughts tend to be similar and related. One might characterize a person with highly auto-correlated thinking as narrow minded, but we all tend to be narrow minded every time we focus strongly on solving some difficult task or concentrate on learning some new skill. Neurologically strong focus is induced by activation of inhibitory neurons through increased release of the neurotransmitter GABA (Gamma-aminobutyric acid) which reduce transition probabilities for long step transitions to irrelevant thoughts.

The problem with a slowly moving cognitive chain like this is the high likelihood of missing out on creative solutions to problems. If step lengths are allowed to increase (by reduction of inhibitory neuron activity) a more diffuse state of mind is induced, that facilitates creative thinking. However, too long step lengths may increase the risk of very remote ideas to pop up, only to be rejected as irrelevant in the current context. For long step lengths auto-correlation may be very low, and thoughts appear to be disconnected. Some persons suffering from attention-hyperactive disorder (AHD) may lack the ability to retain focus over long time due to having random walks of thought with too long steps. In statistical inference and MCMC estimation of some unknown probability distributions a so-called good mixing process is desirable, where the chain moves across the state space in intermediate step lengths, avoiding both being too narrow minded and too diffuse. Such good mixing processes has the largest probability of covering the state space in sufficient proportions within a limited time span. For cognitive processes the definition of good mixing will of course depend on the context, whether focusing or defocusing is most beneficial.

Thoughts suppression and Alzheimer’s – State space reducibility

A state space is irreducible if all states are accessible from any other state within a limited number of steps. If we for simplicity assume a static state space of thoughts and memory, this will be irreducible if any thought or memory can be reached by means of association from any other thought or memory. Of course our cognitive state space is not static, but reducibility of mind may occur in cases were memories are unconsciously suppressed and is never reached consciously (functional reducibility), or if connections to memories are lost due to damaged synapses (structural reducibility) like may happen due to Alzheimer’s disease.

Obsessive disorders – Periodicity of random walk

A Markov chain is periodic with period k if any return to a given state must occur in a multiple of k steps. A chain is said to be aperiodic if k=1. Aperiodic chains are usually desirable in MCMC, but in natural processes periodic processes may occur. It is reasonable to assume that cognitive processes are aperiodic, although some cognitive impairments like obsessive compulsive disorder (OCD) may show temporary periodicity where the patient does not seem to be able to snap out of a circular chain of thoughts.

Context dependent associations – Time-inhomogeneity

There might be times at which “submarine” would be an even more probable association from “yellow” than other times, for instance immediately after hearing the Beatles’ tune on the radio. This means that the transition probabilities are so-called time-inhomogeneous. Time-inhomogeneous Markov chains are often used in MCMC estimation when step-lengths (focus) are allowed to vary over time to optimize chain mixing. The inhomogeneity in cognitive processes is not only time dependent as means to adjust focus level (mixing), but the transition probabilities will also depend on context, place, and mood.

Plasticity and learning – Non-stationarity

This far we have for simplicity assumed that there is a fixed probability distribution across the state space of thought for an individual, and that the state space itself is static. This is characteristic for a stationary distribution in stochastic process theory. There is, however, no reason to believe that the cognitive state space is static, nor that the thought-distribution is stationary. This is due to the fact that we are all expanding and altering the state space through learning, and the brain is continuously changing, both functionally and structurally. The Markov chains of thinking are actually changing the probability distribution over the state space as it moves. This is because repetitively running chains of association is a key part of learning by which the transition probabilities are altered by the change of synaptic strengths of the association networks. Furthermore, new and previously unvisited thoughts occur during the random walk as result of creative thinking or learning from external input. Finally, non-visited parts of the state space may be eliminated (forgotten) through pruning of synaptic connections. The brain is very plastic and hence, so is the state space of thought. The very fact that the process visits these thoughts, increases the likelihood of a later revisit. Hence, the random walk of learning and creative thinking may be considered as a non-stationary stochastic process. If you think about it, this should be obvious. During our lifetime interest, values and the context we’re part of change, and this certainly reflect our thought processes.

Thinking and conversation – Hidden Markov Chains

As mentioned previously, to an outsider my thought process is hidden. In statistical inference Hidden Markov chains are used to model data where an assumption of an underlying stochastic process generating occasional observable output is reasonable. My thoughts are occasionally observable whenever I speak. Hidden MC’s are defined not only by transition probabilities for the hidden state space, but also state dependent probabilities for generating an observable output. Again, these output probabilities may depend on, for instance, the context and the personality. If I am on non-familiar ground, either literary or cognitively, I am less likely to express my thoughts. Furthermore, I am an introvert who are less expressive than extroverts. The cognitive process of an introvert is generally more hidden having smaller probabilities of generating output than the case is for an extrovert. This also has the implication that the outputs of an introvert may seem to be more disconnected and having small auto-correlations. Extroverts’ statements may on the other hand appear to have higher autocorrelation than those of introverts, and the latter group may easily get annoyed by extroverts saying the obvious and sticking too long with a topic.

Collaboration – Parallel chains

A common trick in order to monitor whether Markov chains have converged to their stationary processes, are mixing well and have forgotten their initial values, is to initiate several parallel chains with different initial values spread out across the state space. Comparing within and between chain variability gives information whether the mixing works properly and convergence has been reached. Further, parallel chains may faster cover the state space, and integrating the information from all chains yields quicker estimates of the properties of the state space distribution.

Parallel and hidden Markov chains interact in the context of a conversation at the lunch table, or during group based learning. Flipped classroom learning is an example of group learning which means that students see lecture videos at home preparing for group based learning and problem solving at school. The teacher operates more like a guide and discussant than a lecturer as he or she visits the groups. The homework prepares the student for the group learning process, and each group member joins the collaboration with their own hidden thought process with individual initial values. In addition varying experience and knowledge levels, interests values and personalities yields individual cognitive state space distributions. During the group process the parallel hidden random walks of thought evolve jointly towards a better understanding of the subject to be learned. Through conversation associations are exchanged which may lead to jumps in the hidden processes. These jumps can result in a better coverage of the state space and faster learning for each individual group member. The integration of information from multiple hidden parallel chains becomes effective through conversation and collaboration. At this point students’ personalities may influence the effectiveness of the learning process. Introverts have, as discussed above, smaller probabilities of generating outputs from the hidden chains compared to extroverts. Extroverts may therefore quicker get a correction of direction through the interaction with other extroverts, and this again may lead to faster convergence of thoughts than the case is for introverts who may get stuck sub-sampling limited thought regions for a long time.

Creativity – The parallel hidden chains of unconscious associations

Earlier I wrote that long step lengths of associations increases creativity, but the truly creative driver is probably the hidden and parallel processes of the unconscious mind. There is neuronal activity even in a resting brain and in brain regions that are not monitored by our consciousness, not only the sub-cortical regions and the cerebellum, but also in cortical regions which are outside the focus area of our consciousness (see my previous post …). Even if the signaling processes in these regions are hidden to us, they are likely to walk along the paths of highest transition probabilities. Furthermore, the unconscious random walks of associations are not restricted to be followed by our conscious attention (which is univariate). Hence, there may be multiple parallel hidden chains running in the unconsciousness. This may explain why the unconscious is such an effective problem solver and generator of creative thought. Sometimes the hidden processes produce coherent sampling which induce conscious attention by generating an (to our consciousness) observable output, an a-ha moment (This very idea was in fact served me from my unconsciousness right before going to sleep after writing the introduction to this post). How the unconscious processes contribute to a conscious experience and attention was among the topics for my previous post The statistics of effective learning.

…

In this post I have presented some similarities between statistics and cognition, and once more it seems like nature thought of it first. However, statistical knowledge may give some new insight and understanding of cognitive processes, as discussed here.

The post The random walk of thought appeared first on Metacognition.

The statistics of effective learning

Solve Sæbø — Thu, 25 Aug 2016 09:18:17 +0000

Throughout your days in school, and maybe also in college and at the university, you have had a lot of different teachers and lecturers. Most likely you also have a favorite among these, a really good teacher who managed to completely capture your attention and focus. I had a math teacher once to whom the entire class payed close attention, in every lesson, but how did he do it? I guess he just had the talent for it, but what is the secret behind a really good lecture or an excellent presentation?

A place to start is to learn from experience lecturers, and if you want to create and give great presentations, there is a lot of pedagogical advice out there, from textbooks in teacher’s education to dedicated presentation services on the internet. Here is a little list of some of the advice that I found from some websites like www.ethos3.com and blog.ted.com which I have organized into four main points.

Make structured presentations, minimize text on slides
Use storytelling, relevant visuals and other multi-media sources, move your hands
Speak in short burst and vary your vocal level, include elements of surprise
Dress properly, smile and make eye contact

As a quite experienced lecturer in statistics at the Norwegian University of Life Sciences I am familiar with these advice, and although I still have a lot to learn, I have experienced that they do help to keep the attention from the students. But, why do they work? In this post I will address aspects of statistics, information theory, consciousness, attention and neuroscience which may help answer this question.

Measuring information

First of all, it is a matter of signal and noise, of course. That is what statistics is all about, and learning. From a varying, and at first apparently chaotic input, learning takes place when we discover some new order, a higher level of categorization, which can explain parts of the observed variation. It is my challenge as an educator to help my students bring order into chaos.

As I teach I provide input signals to my students. The perceived information content of an input signal may be measured or represented in different ways. In statistics we often use some signal-to-noise ratio, like the Fisher test statistic in Analysis of Variance (ANOVA). At any given time we are in possession of beliefs and theories about the outer world. If our belief about the world that generates incoming signals is plausible, the world appears to be to some extent predictable. Our inner model is good, and the unexplained part of what we observe (the noise) is small, which results in a high signal-to-noise ratio. On the other hand, if the input appears to be surprising and unpredictable, we have still something to learn about the processes that produce the input. Our inner belief model is poor, and a large part of what we observe appears as noise. According Hohwy (2013) life (and learning) is a continuous process of prediction error minimization by model (belief) adjustments.

Another measure of information content is the concept of mutual information, which is the difference in entropy (degree of disorder) between two variables, like the input from the world that we receive through our senses on one hand, and our predictions of the world generated by our current beliefs on the other. If these variables are similar, then the level of mutual information is high. The mutual information is linked to prediction error by the fact that the mutual information increases as the prediction error decreases.

Chaos and order

The more we learn, the better our inner models of the world become, and the better we can predict. We are simply less surprised by what we observe! The mutual information between the content of my lecture and the students understanding of it increases as order is brought into chaos. As a lecturer I must help my students to adjust their inner prediction models in such a way that they steadily become less surprised by what I say. When most of what I say appears to them to be predictable, the students are ready for the exam!

The goal is order and stability (predictability), but as Van Eenwyk (1997) states:

“.., whenever stability is a goal of adaption, chaos’s contribution rivals that of order. From that perspective, order and chaos reflect one another. Just where the line between the two exists depends largely on our ability to recognize similarities in the chaos.”

Hence, a good learning process is shifting constantly between chaos and order, then new chaos followed by new order, and so on. If I as a lecturer do not manage to create new chaos, my presentation becomes boring, fully predictable and unnecessary. There is no new order to be set. A good presentation should therefore strive to be somewhat unpredictable, because it is the discrepancy between the students’ beliefs and what I actually say, the surprise, which carries the potential of new learning.

Conditions for effective learning

I just mentioned attention, which of course is a necessary condition for learning, and another necessary condition is consciousness. A main message in this blog post is the following:

A good learning process should raise the level of consciousness and trigger focused attention.

Sounds simple enough, and the list of good advice I presented earlier may help accomplish this. Let me first elaborate a bit on consciousness and attention before we discuss again the presentation advice I found.

Consciousness

Even though psychologists have shown that we may pay attention even if consciousness is low, and the opposite, that we can be quite conscious, but don’t pay attention (Hohwy, 2013), it is the combination of high consciousness AND high attention which is required for optimal learning conditions.

Consciousness, what is it really? We all have a first hand experience with it, yet it is so hard to explain, but new theories have been suggested, among which I find the integrated information theory (IIT) of Tononi (2008) and co-workers as the most promising. Put short the theory is based on the level of integrated mutual information inherent in a neural network. The model explains how a central part of a network, for instance the frontal cortical region, may reach above some imagined critical level of integrated information, which is believed to give rise to consciousness as an emergent property of the network.

[An emergent property is according to the Stanford Encyclopedia of Philosophy a state that “‘arise’ out of more fundamental entities and yet are ‘novel’ or ‘irreducible’ with respect to them.” In our case we might say that consciousness is a state which cannot be predicted from observing the reduced properties of the neurons and the neural network.]

What is really interesting by the IIT of consciousness is that the level of consciousness in a central cortical network may depend on and even be strengthened by connected sub-networks, such as sub-cortical circuits or cortico-cerebellar pathways, which will remain in the unconscious as long as these sub-networks are sufficiently segregated (loosely connected). This can for instance explain why we are not conscious of processes controlled by, for instance, the brainstem (e.g. heartbeat, breathing, body temperature control) and the cerebellum (like automated body movements). Neither are we directly conscious (luckily) of the immense data processing and data filtering going on in the sensory cortical regions like the visual cortex. However, the filtered information from the unconscious regions are gathered, integrated and potentially bound together in the central and “conscious part” of the neural network.

We may envision the conscious part of the network as the part of an iceberg which is visible above sea level, and the unconscious network regions as the large lump of ice below the surface supporting the conscious experience. The sea level is the critical level of mutual information required for consciousness to occur.

The iceberg above sea level represents the potential focal points that can be sampled by our conscious attention, as I also wrote about in the post “You random attention“. Further, and as explained by Hohwy (2013) we tend to choose an attention point which is surprising to us, that is, with a high prediction error.

Just think about it, imagine that you sit at a lecture listening to a not so engaging presentation about a well known subject. If something unexpected suddenly happens, like a cell phone ringing, your attention will immediately be drawn towards the surprising element.

So, here comes the element of surprise again. If a lecture is somewhat unpredictable, it will help to draw attention to it. But there is even more to this story. According to Friston (2009) attention is drawn strongest to a focal point with a combination of high prediction error (surprise) AND expected high information quality (precision).

Thus, our attention is drawn towards objects or actions that represent high level of surprise, which we in addition find to be reliably observed or to come from a trusted source. For example, we are more likely to pay attention when a highly respected person than a mere common takes the stand to give a speech.

Varying levels of consciousness

I guess you have seen, either with your own eyes or from pictures, icebergs of different shapes. Some are peaked, reaching relatively high above the surface, whereas others are just flat, barely visible above water. The first kind of iceberg may illustrate a high conscious state where the unconscious inputs are congruent and may be integrated into a strong, unified conscious state of mind.

Maybe you have some vivid memories from your youth or childhood? I also have some, and when I think back now, I can describe the given moment vividly, both what I saw, heard, and felt, and maybe also smell or taste was involved. Chances are also that such vivid memory experiences triggered some emotions, either good or bad. I think the reason I remember these moments so strongly is the fact that all unconscious inputs to the information integration network were congruent and could strengthen the conscious experience by adding different dimensions to it. My brain found the strongest focal point to attend to and the moment was glued to my memory!

The other type, the flat icebergs, represent the opposite. We are awake and conscious, but the input signals are either weak, noisy an unreliable or non-congruent and distracting. Think about the non-engaging lecture again… Attention points are hard to find or are very unstable on this flat surface. The conscious experience is weak and you are on the border of unconsciousness, maybe half a sleep or daydreaming. This state of mind is good for thought wandering, creativity and free association, but not optimal for learning “facts” from external stimuli.

Let us sum up what we have so far. In order to create a good learning experience through instruction a lecturer should:

Increase attention: Keep an element of surprise, avoid being predictable and monotone, and be precise and trustworthy.
Increase consciousness: Be congruent in the way that all inputs to the students add to a unitary, integrable experience.

Building a unitary, focused learning experience

This latter point is very important, but under-communicated. Sadly, I have seen so many presentations, especially at scientific conferences, were slides crowded with text are flashed before the eyes of the audience while the speaker tries, with good intentions, to express the same message in other(!) words, maybe with a non-engaging body language. What really happens is that the speaker conveys simultaneous, but non-congruent messages, one visual-lingual and another auditory-lingual. The brains of the audience then fail to integrate the information, the signal-to-noise ratio becomes low, and the body language of the speaker reduces the signal content even further. The top of the iceberg is flat and the audience enters a state close to unconsciousness. Some at the back row may even reach it entirely…

A small side comment fits well in here, because simultaneous non-congruent input is, in fact, also a type of communication, but a type which has been used for inducing hypnosis (!) by professional psychiatrists, like the famous Milton H. Erickson (Grinder and Bandler, 1981). A hypnotic state may be induced by simultaneous over-loading multiple senses for some time, until the hypnotist offers the patient to enter unconsciousness as an escape from the distracting and exhausting condition.

This is NOT what I would like my students to experience! Still, I fear that many lectures given at universities have similar hypnotic effects in the way students are over-loaded with non-congruent and non-integrable information!

Body language and voice

I mentioned non-engaging body language as a distraction in the above example, but “Moving your hands” was on the list of good advice, and moving your hands and a rich body language may increase the attention to the presentation if used properly.

I recently came across a post on the TED-blog by Alison Prato. The sub heading of the blog was “All TED Talks are good. Why do only some go viral?” There Vanessa Van Edwards, the founder of the consultancy called Science of People, gives some clues as to why some TED-talks become more popular than others. In their study hundreds of TED-talks were rated by 760 volunteers, and it is interesting to see how well the results fit the points I have made here.

Firstly, the most popular lecturers were by a test panel found to be more credible and having higher vocal variety. Edwards says:

“We found that the more vocal variety a speaker had, the higher their charisma and credibility ratings were. Something about vocal variety links to charisma and competence.”

Secondly, the lecturers that go viral had a richer body language than others. Edwards says, that this was a bit surprising: “We don’t know why, but we have a hypothesis: If you’re watching a talk and someone’s moving their hands, it gives your mind something else to do in addition to listening. So you’re doubly engaged. For the talks where someone is not moving their hands a lot, it’s almost like there’s less brain engagement, and the brain is like, “This is not exciting” — even if the content’s really good.”

I think they are onto something here, but according to my hypothesis, the body language must not only be rich, but it must also be congruent with the spoken word. Random hand-waving is distracting and lowers consciousness. Also, high vocal variety may be distracting and lower the impression of competence if used wrongly.

You can read the entire TED-blog by Alison Prato here.

The salience network for attention and prediction error minimization

Thus far my arguments for effective learning conditions have been quite theoretical and based on information theory, but there is also neuroscientific support for this. In later years neuroscience has shifted from a quite modular view of the brain, where functions (and malfunctions) are tightly connected to specific brain regions, to a view where brain functions are results of network processes and communication between regions. A series of functions and pathologies have lately been connected to networks and their properties (see e.g. Sporns 2011), among which attention has been connected to the so-called salience network (Menon (2015). This is a frontal lobe centered network, which includes brain regions like the dorsal anterior cingulate cortex (dACC), the anterior insula (AI), amygdala (A) and the ventral striatum (VS). Menon (2015) describes this network as

“…a high-order system for competitive, context specific, stimulus selection and for focusing ‘spotlight of attention’ and enhancing access to resources needed for goal-directed behavior”.

The network is connected to sensory cortices for external inputs, to the AI for self-awareness, to the emotion center of A and context evaluation in VS. The dACC is believed to be the center for evaluating surprise, or prediction error. This means that this network carries all necessary constituent needed to serve as a conscious attention network, and combined with the theory of Tononi, a high conscious state allowing strong attention, may be induced if all input information is congruent, and if the input fits emotion and interests and feels relevant, it is even better.

Advice explained

It is time to return to the list of good advice for presenters. Based on the theories of consciousness, attention and prediction error minimization as we have just worked through, we should be able to put some explanation to why these are good advice.

Here is the list again with explanation to why they improves learning:

Make structured presentations, minimize text on slides
- Brings order into chaos, helps categorize
- Increases signal-to-noise ratio and mutual information
- Increases consciousness and attention
Use storytelling, relevant visuals and other multi-media sources, move your hands
- If done properly, adds extra dimensions to the learning experience
- Congruent sensory input
- Unitary and integrable experience
- Increased consciousness and attention
Speak in short burst and vary your vocal level, include elements of surprise
- Increasing prediction error among the listeners
- Increased consciousness and attention
Dress properly, smile and make eye contact
- Increases trust
- Increases expected information quality (precision)
- Increased consciousness and attention

Personalized learning

However, it may not be as simple as this. If you ask people in the audience after a lecture whether they liked a presentation or not, you may receive quite different answers. This may be because we are all born with quite personal brain networks. The connections and their strengths in the network differ from one person to another, and we all integrate information in our own personal manner. Some may, for instance, place more emphasis on the emotional part via strong connections to the amygdala as they integrate the incoming information in the salience network, than others do.

I will not spend time on personality theory here, but the links to Walter Lowen (1982) and his entropy based model for personalities and even consciousness is apparent, and I may draw that link in a later blog post. I will here only remark that effective learning probably depends on the actual topology of the attention network of each and every student.

Acknowledgement

I’m very grateful to my good friend and colleague Dr. Helge Brovold at the National Centre for Science Recruitment in Norway for our many good chats about these topics and for his willingness to share his knowledge and interesting books with me.

References

Friston, K. (2009). The free-energy principle: a rough guide to the brain?. Trends in cognitive sciences, 13(7), 293-301.

Grinder, J. and Bandler, R. (1981). Trance-formations: Neuro-linguistic programming and the structure of hypnosis. Real People Pr.

Hohwy, J. (2013). The predictive mind. OUP Oxford.

Lowen, W. (1982). Dichotomies of the Mind. Wiley & Sons, NYC

Menon, V. (2015). Salience Network’. Brain Mapping: An Encyclopedic Reference, Academic Press: Elsevier.

Sporns, O. (2010). Networks of the Brain. MIT press.

Tononi, G. (2004) An information integration theory of consciousness. BMC Neuroscience, 5:42.

Van Eenwyk, J.R (1997). Archetypes and Strange Attractors. Inner City Books, Toronto, Canada.

The post The statistics of effective learning appeared first on Metacognition.

A mathematical view on personality

Solve Sæbø — Thu, 10 Mar 2016 12:19:43 +0000

Both personality and consciousness were properties of the human psyche that were central to the thinking of Carl Jung. He observed the stability of certain personality types among people, but also their complexity and unpredictability. He defined the archetypes as these stabilities, and the complexities and unpredictability he dedicated to the constant interplay between the conscious self and what he termed “the shadow side” of the personality.

(Carl Jung, by PsychArt/ CC BY / Desaturated from original)

The ideas of Jung have been debated, and other personality traits than his archetypes find broader support today, but new support to Jung may now come from a mathematical perspective.

I will give a short glimpse of a mathematical hypothesis for personality here that I intend to elaborate on elsewhere later.

Within computational neuroscience mathematical theories have contributed a lot to increase our understanding of how brains work, not only at a neuronal level, but also at network level, for instance, how memories are stored and recalled, and how we associate and make decisions (See e.g. Rolls and Deco, 2010). Interestingly, more diffuse properties of the human psyche, like personality (Van Eenwyk, 1997) and consciousness (Tononi, 2004) may, in fact, be connected to mathematical properties of networks, and in this post I will focus on what mathematics can teach us about these matters.

In mathematics there are complex models for information transfer across networks called attractor networks, and the neural network of our brain appears to be well approximated by these models. I have already touched upon these attractors in a previous post on creativity, because creativity is linked to the ability to easily move from one attractor state to another along new or alternative paths.

Attractor networks are built from nodes (for example neurons) which typically are recurrently linked (loops) with edges (like synaptic connections), and the dynamics of the network tend to stabilize at least locally to certain patterns. These stable patterns are the attractors. For example, a memory stored in long time memory may be considered as a so-called point attractor, a subnetwork of strongly connected neurons.

The point attractors are low-energy states in an energy landscape with surrounding basins of attraction, much like hillsides surrounding the bottom of a valley, as shown in the figure below. Random perception signals are like rainfall finding its way down to the closest attractor, leading to a thought, a memory recall, an association or a decision to react.

(Point attractors in a network, by Eliasmith / CC BY )

Also other types of mathematical attractors exist, like line-, plane- and cyclic attractors, and these have been used to explain neural responses like eye-vision control and cyclic motor control, like walking and chewing (Eliasmith, 2005).

Common to these attractors are their stability and predictability, and this is good with regard to having stable memory and stable bodily control, but what about personality? Is personality also an attractor? Do we all have our basins of attraction, which pulls our personality towards stable behavior?

Probably yes, but if you think about it, personality is a more unpredictable property than memory and body control. We think we know someone, and the suddenly they behave in an unexpected manner. Still, the overall personality seems to be more or less stable. How can something be both stable and unpredictable at the same time?

Well there is another class of attractors that may occur in attractor networks. These are the strange (or chaotic) attractors, and they are exactly that, partly stable and partly unpredictable. We say they are bounded, but non-repeating.

A famous example is the Lorenz attractor discovered by Edward Lorenz while he was programming his “weather machine” where typical weather patterns appeared, but never repeated themselves. In the figure below the blue curve is pulled towards the red strange attractor state, and once it enters the attractor, it is bound to follow a certain pattern, though it never repeats itself.

(Lorenz attractor in a network, by Eliasmith / CC BY )

The discovery of strange attractors led to the development of chaos theory and fractal geometry in mathematics. Many phenomena around us may develop smoothly in linear predictable fashions until a certain border is reached, at which point a chaotic state appears before a new order may be settled. Just think of water being heated towards boiling temperature. A chaotic state occurs just as there is a transition from liquid state to vapor.

Some scientists now believe that the transition from unconsciousness to consciousness may be a similar transition between states. The mathematical model of consciousness proposed by Tononi (2004) is based on the assumption of the capacity of a network to integrate information. If the level of information integration crosses a certain limit, a new and emergent state is entered, consciousness. This corresponds to a fundamental change in the property of the network as a whole, and I think we can all agree upon the fact that there is a fundamental change between being asleep and being awake. There is no linear transition between the two.

What about personality?

Well, a person’s personality is of course most apparent in our most conscious state, where we act in a dependable and responsible manner, although there is also an unconscious side to it, according to Jung, and it is often referred to as the “shadow” of our personality. The shadow side of our personality, residing in our unconscious, is according to Jung, balancing our conscious side, and it also holds the key to relaxation, serendipity and creativity, but also to the irrational behavior in stressed situations. This was also recognized by Walter Lowen (1982), a systems science researcher on artificial intelligence who developed a model for personality that reached far beyond the simple dichotomies of Carl Jung.

To many, this may just sound like psychological thinking, but if we combine the theory of strange attractors from chaos theory with the model of consciousness by Tononi, I think we may find theoretical support to the thoughts of Jung.

According to Tononi, consciousness depends on both information integration and information segregation. Loosely speaking, consciousness is generated by a “central” network complex with a high capacity for information integration, whereas other, but connected sub-networks containing specific and segregated information, may contribute without actually being part of the central network complex.

A good example is the Cerebellum, which contains more neurons than the entire remainder of the brain (the Cerebrum). Still, the activity of the Cerebellum is totally unconscious to us. The Cerebellum contains segregated information and procedures, like books and instruction manuals in a library. This information may however be integrated by other parts of the brain, combined with input signals from our senses into a conscious experience.

The main point here is that certain parts of the brain, and certain circuits, are more involved than others in what we can call the conscious complex of the brain. The other parts are connected and contribute, but work quietly in the “shadow”.

The way information is integrated is still unknown, but it may very well be in the form of strange attractors, cycling through regions of the brain in somewhat unpredictable, yet bounded manners. Some attractors work within the conscious complex, others work in connected, but unconscious parts.

I believe that personality depends on the properties of these strange attractors, and how these attractors are distributed, either within the central conscious complex, or in the more peripheral unconscious network space!

For instance, for the personalities Jung characterized as “Feeling” (correlated with being “Agreeable” in Big Five theory) the ventral circuits involving the sub-cortical brains parts like the Amygdala, may be part of a strange attractor within the conscious complex, whereas personalities being “Thinking” (or less “Agreeable”) may have more dorsal conscious attractor states.

This can serve as a theoretical explanation to why some people tend to base their decisions more on values and emotions than others who tend to make more impersonal decisions based on logics.

Other personality traits are also possible to explain by this theory.

Personalities are strange attractors, unpredictable, never repeating, but bounded. People have similar types, yet we are all being different. In the mathematics of chaos and strange attractors this is known as the sensitive dependence on initial conditions. It was this that lead Lorentz to discover the chaotic property of his weather machine. Others have referred to this as the butterfly effect, how the minimal effect of a butterfly flapping its wings can through a cascade of causalities result in a storm on the other side of the world.

We are all born with different initial conditions, different families, different environments, different experiences. We may be born with the same set of strange attractors, but we will never be equal, only similar, since we are bounded within the same basins of attraction.

It is this individual unpredictability that makes it so hard to understand the human mind. The only thing we can do is to use statistical models like the Big Five (factor analysis) or Jung/Lowen (dichotomous classification) to try to separate the stable properties of the attractors from the unpredictable chaotic variation among individuals. However, new brain scanning technologies now provide unprecedented possibilities to go beyond Carl Jung and to study the strange attractors of the brain by multivariate statistical meta-analysis in order to get a better grip on what makes us similar, yet different.

I think Carl Jung was not far off!

References

Rolls E.T and Deco G. (2010) The noisy brain. Oxford University Press Inc, New York.

Van Eenwyk, J. R (1997) Archetypes & Strange Attractors – The Chaotic World of Symbols. Inner City Books, Toronto, Canada.

Tononi, G. (2004) An information integration theory of consciousness. BMC Neuroscience, 5:42.

Eliasmith, C. (2005) A Unified Approach to Building and Controlling Spiking Attractor Networks. Neural Computation, 17(6), 1276-1314.

Lowen, W. (1982) The dichotomies of the mind. Wiley and Sons, New York.

The post A mathematical view on personality appeared first on Metacognition.