In a general sense, a Markov Network Brain (MNB) implements a probabilistic finite state machine, and as such is a Hidden Markov Model (HMM). MNBs act as controllers and decision makers for agents that interact with an environment and agents within the environment. Thus, a MNB can be thought of as an artificial brain for the agent it controls. Similar to the input and output layer of an artificial neuronal network (ANN), a MNB is composed of a set of input nodes (i.e., sensors) and output nodes (i.e., actuators), except that all nodes of a MNB are binary, i.e., the nodes can only be in one of two states: 0 or 1. In addition, a MNB contains a set of hidden nodes, which act as memory for the MNB to store information in.
We have a stable implementation of MNBs on github: https://github.com/dknoester/ealib
How Markov Network Brains Function
When we embed agents with MNBs into an environment and provide it sensorial inputs, these inputs are written into the MNB input nodes. Once provided with inputs, we activate MNB and all nodes pass information through the MNB by updating their states. While input nodes are usually overridden by sensory information from the environment at the beginning of the next brain activation, hidden nodes and output nodes are of particular interest, and their states depend on the particular configuration of the MNB. Hidden nodes can be thought of as the memory of the agent or a mechanism to represent an internal state, whereas output nodes determine the action of the agent for that particular point in time. In most cases, the output nodes encode a finite set of actions. For example, two output nodes can be used to steer a tank, according to the output sets in Table 1.
|Output A||Output B||Encoded Action|
Table 1: Example agent action encoding with two output nodes. Each output combination encodes a discrete action taken by the agent.
Arbitrary encodings can be used, but simpler encodings are more conducive to the evolution of effective behavior. In order for agent to be able to react to the environment, the output nodes must somehow connect to the input nodes, and if memory is required for more complex tasks, the output nodes must also depend on the states of hidden nodes. Consequently, hidden nodes might also depend on the input nodes. In a MNB, node states are updated by probabilistic logic gates (PLG), also known as a Hidden Markov Gates, which function similar to classic logic gates (e.g., AND, NAND, OR, or XOR). A classic logic gate, e.g. XOR, reads binary states from two input nodes and updates a single output node according to the XOR logic. Alternatively, a classic logic gate can be described with a probability table that maps each possible input to a probability distribution of its outputs. In the case of a XOR gate, there are four possible input sets (00, 01, 10, and 11) and two possible outputs (0 or 1). Table 2 shows the equivalent probability table of an XOR gate.
|Input A||Input B||p(0)||p(1)|
Table 2: Probability table for an XOR gate. The input A and input B columns contain the possible states of the input nodes. p(0) and p(1) are the probabilities that the output nodes is a 0 or 1, respectively, given the corresponding input.
While classic logic gates are deterministic, probabilistic logic gates are composed of arbitrary probabilities in their probability table. Therefore, while the output states still depend on the input states, they can also have a degree stochasticity to their output. Figure 1 illustrates an example PLG, with three binary inputs entering the PLG: 1 and 2 coming from sensory input nodes, while input 3 comes from a hidden node. The PLG is composed of a 22 x 23 state transition table which encodes the logic for the PLG. Once provided with inputs, the PLG activates and updates the states of hidden node 3 and output node 4. Since the PLG outputs to the same hidden node that it receives input from, it is forming a recurrent connection, i.e., memory. The state of output node 4 can encode two possible actions, such as turning left (0) or right (1).
The PLGs we implemented in this model can receive input from a maximum of four nodes, and write into a maximum of four nodes, with a minimum of one input and one output node for each PLG. Any node (input, output, or hidden) in the MNB can be used as an input or output for a PLG. MNBs are composed of an arbitrary number of PLGs, and the PLGs are what define the internal logic of the MNB. Thus, to evolve a MNB, mutations change the connections between nodes and PLGs, and modify the probabilistic logic tables that describe each PLG. Figure 2 demonstrates a MNB with 12 nodes connecting to 2 PLGs, and how these two PLGs affect the states of the nodes they write into after one brain activation.
It is possible for two or more PLGs to write into a single node, and each PLG likely has a different value it wants to place in the common node. This conflict is resolved by using an OR function on the values entering the common node. Thus, whenever one PLG writes a 1 into a node with multiple inputs, that node becomes 1 regardless of the inputs from other PLGs.
Genetic Encoding of Markov Network Brains
We use a circular string of bytes as a genome, which contains all the information to describe a MNB. The genome is composed of it genes, and each gene encodes a single PLG. Therefore, a gene contains the information about which nodes the PLG reads input from, which nodes the PLG writes in to, and the probability table defining the logic of the PLG. The start of a gene is indicated by a start codon, which is represented by the sequence (42, 213) in the genome. The specific sequence we chose to represent the start codon is arbitrary; we chose 42 as a tribute to Douglas Adams, and 213 is 255 (the maximum value of a byte) minus 42.
Figure 3 provides an example genome. After the start codon, the next two numbers describe the number of inputs (Nin) and outputs (Nout) used in this gate, where each N is defined by the equation:
where number is the byte number in the genome string. In this case, Nmax = 4.
The following Nmax numbers of the gene specify which nodes the PLG reads from by mapping to a node ID number with the equation:
where number is the byte number in the genome string, # nodes is the number of nodes in the MNB, anddenotes the nearest integer.
Similarly, the next Nmax numbers encode which nodes the PLG writes to with the same equation as Nin. If too many inputs or outputs are specified, the remaining sites in that section of the gene are ignored, designated by the # signs.
The following 2Nin + Nout numbers of the gene expresses the probabilities composing the 2Nin x 2Nout logic table. We sequentially fill the logic table row-by-row with numbers from the genome. Once the logic table is filled, we convert the gene numbers into the corresponding probabilities (pij) with the equation:
where numberij is the byte number in the probability table at index [i, j].
Since we use bytes to specify the values in the table, the rows of the probability table are normalized to 1.0. We apply the modulo operator on the number of inputs and outputs as well as the IDs of the nodes used as inputs and outputs in order to keep them within the allowed ranges.
The number of nodes allowed and which nodes are used as inputs and outputs are specified as constants by the user. Combined with these constants, the genome described above unambiguously defines a MNB. All evolutionary changes such as point mutations, duplication, deletions, or cross over are performed on the genome, and only take effect after the genome is translated into a MNB. Thus, the MNB can be thought of the phenotype expressed by the genome.
MNBs can be visualized in several ways. Since the visualization of the MNB in Figure 2 is somewhat inconvenient, and the gates are less important than how nodes depend causally on each other, we usually only display a graph similar to Figure 4 showing the causal relations between the nodes.
Markov Network Brain FAQ
Can genes overlap?
Yes, a 42 followed by a 213 defines a start codon. Whenever a start codon is found, the subsequent bytes are used to define a new PLG as well as the remainder of the current PLG. As a consequence of this overlap, a single point mutation can affect multiple genes. This kind of gene overlap is commonly observed in nature.
Why not use 0 and 255 as start codons?
In prior experiments, we observed that the probabilities in a gene tend to converge on 0 or 255, making the PLGs more deterministic. Therefore, we did not use 0 nor 255 as start codons because an excessive number of genes would end up being encoded.
Is there directionality in the genes?
We read from the beginning of the sequence to the end in one direction. If a gene extends past the end of the genome byte string, we continue reading at the beginning of the byte string again, i.e., the genome is a circular byte string.
Why do you OR the output from PLGs that write into the same node instead of preventing gates from writing into the same node?
We do not want to give one gene priority over another gene. If we excluded genes from writing into the same node, we would be required to prioritize which PLG is allowed to write and which one is disallowed. An intuitive method would be to use the order on the genome, such that a PLG which comes earlier in the genome would have priority over other PLGs. We decided against that method, and instead chose to OR the combined outputs. Other output combination methods are possible as well, such XOR, AND, or even thresholds. We experimented with all of these methods and found OR to be the optimal, without having explicit data to support this.
Since writing into sensors is pointless, why do you allow it?
On a genetic level, we treat input nodes and output nodes the same way, therefore gates can write into the input nodes of the agent. However, the user decides which nodes are used as inputs or outputs. When we translate a genome into a MNB, we don’t explicitly know which nodes are inputs and outputs, therefore we allow gates to write into nodes that are designated as inputs. After the brain is activated, sensorial inputs from the environment override anything the gates might have written into them anyway.
Why do you allow gates to read from actuator?
Human brains can sense what its muscles are doing and what angle its joints are at, therefore we see no reason to disallow a MNB from doing this as well.