Life can not count

If you have two populations how can you tell which is bigger?  If you can count them all then it is trivial.

Programmers and mathematicians are used to seeing:

```blue_balls = 4
red_balls = 2
if(blue_balls > red_balls): print "We have more blue than red balls"```

But the cells and molecules that make up living systems are not afforded the advantage of performing a global count.  So far it seems all computations they can perform at the molecular and cellular level are from random interactions with each other, i.e. when they randomly bump into each other.  And many biological systems rely on this process to figure out which molecule is in excess of another and therefore what to do.

An example comes from a region of DNA in a yeast species called Schizosaccharomyces pombe, where the excess of two proteins versus another two proteins causes certain genes to be switched on or off.

The figure above is a bit complicated to explain in detail but the core of what is happening is simple:

1. If there are more of the black than white proteins present in the cell, the black proteins will convert the white circles on the DNA (acetylated DNA) into black diamonds (methylated DNA).
2. More black diamonds means the genes for producing the white proteins are silenced (switched off) and the genes for producing the black proteins are activated (switched on).

This is a simple positive feedback loop.  Likewise if there are initially more white than black proteins, the biological algorithm will result in the consensus being reached that white proteins are in the majority and will ensure it remains that way.  Note that both these processes progress via an intermediate step where the DNA is neither methylated (black diamonds) or acetylated (white circles) and this occurs in cases of ambiguity where both black and white entities (proteins and modified DNA) are present.

We can simulate this biological algorithm in code, specifically I have chosen the Approximate Majority (AM) consensus algorithm.  In this system there are 3 actors (instances of `Entity`), A, B and C.  Two of them, A and B are the starting entities and the biological algorithm’s aim is to determine which one was in the majority at the start.  To accomplish this the biological algorithm uses 3 reactions:

1. A  +  B  →  2C
2. A  +  C  →  2A
3. B  +  C  →  2B

Reaction 1 says that if A and B randomly bump into each other they will react to form two of C. Reaction 2 says if A and C meet they will form 2 of A, and similar for reaction 3 with entities B and C.  In this case A represents both the black diamonds and the two black proteins, with B representing all the white parts.  C represents the unmodified DNA.  The initial question of “which population has more” is now easily answered and the answer is correct most of the time (and to perceive the answer biology only has to connect the presence of protein A or B, and perhaps the absence of C, to some other effect and wait).

Follow this link and press the spacebar key to try out the algorithm. The graphs shows how the average number of the entities A, B and C change at each of the iterations of the Approximate Majority algorithm.  To see how a single reaction progresses graphically change the “Number of runs” to 1 and refresh (press ‘r’ followed by the spacebar) or view it visually progressing.

Note that this is still only an approximate majority.  It will be more likely to make mistakes if there is not a majority of one entity versus the other at the start.  For example if entity A begins with 30, and B with 15 the algorithm will get the correct answer with 3 nines correctness (99.9%).  Increase B to 25 and the correct answer is reached in only 87% of the algorithm runs:

Under the hood

If you are curious the reaction dynamics are being run by a “bounded” Gillespie algorithm, where the time until next reaction is bounded to ensure an interesting (fast enough) and comprehensible (not too fast) animation.

Caveats

The AM algorithm is not exactly the same as biological example above because if all the A and B are consumed leaving only C, the AM algorithm will not resolve to an answer.  In the actual biological system above, this case of stalemate could theoretically arise if all the DNA is left unmodified with neither black proteins or white proteins being produced however biology is messy: it will likely produce a low level of black and white proteins regardless of the state of DNA modification.

Additionally proteins have a life time and are eventually deconstructed and recycled by the cells that contain them, a fact which is also not yet modelled in this system.

Finally the Gillespie algorithm underlying it is using a very simple model of reaction propensities which may only approximate biochemical behaviour.

I was inspired to build this implementation of a biological consensus algorithm by a talk on biological systems, their simulation and application by Andrew Phillips at Microsoft.  Additionally I wanted a project to cut my teeth on some d3 animations and Typescript.  I also saw the 2015 Oxford iGEM team’s tweet on using Gillespie’s algorithm for stochastic modelling and it all came together nicely.

There are plenty more synergistic relationships between biology and computer science that might inspire some more blog posts.