Tuesday, 2 January 2018

Why biomolecular computing could pose a cybersecurity threat, why it does not, and why it is still an interesting question.

By Ismael Mullor Ruiz

There is not much needed to construct a computer. We are not referring to the hardware here, but the basic operations that the computer must be able to implement. All possible computer configurations available are to some extent a physical approximation to the configuration envisioned by Alan Turing in his seminal paper in 1936 [1]: the Turing Machine. Turing imagined an infinitely long paper tape containing a series of sites in which two different states could be encoded (the minimum information unit, what would later be called a bit). In addition, he considered a “head” that could go through the tape performing different operations such as reading the information encoded, copying it and writing over the tape (the basic algorithmic operations) following defined rules. Such a device could, in principle, perform all possible computational tasks (it is “Turing complete”).

As said before, this setup is only a theoretical model, but in order to get a physical approximation that allows us to have a computer, we only need a physical system that can emulate this behaviour. The requirements are actually quite loose and allow the implementation of Turing complete machines in a wide array of phenomena, including the well-known electronic devices being used to read this post, but also other non-conventional configurations like biomolecular computing, membrane computing or optical computing.

It must be noted that for any reader with deep knowledge of molecular biology, the resemblance between the theoretical Turing machine and life’s basic molecular machinery is striking. For example, consider the presence of an information coding substrate – nucleic acids in biology - on which different machines – such as enzymes, ribozymes or some oligonucleotides - can operate. These natural devices can read the stored information, copy it into different formats, and overwrite or remove information given some rules.


Similarities between theoretical and biomolecular Turing machines. Adapted from Saphiro and Benenson (2006) [2].

Despite these similarities, it took almost 30 years from the publication of the central dogma of molecular biology, which outlines how the information in DNA is used by the cell, and almost 20 years from the birth of genetic engineering, to actually exploit biomolecules for computation. The birth of biomolecular computing came in 1994 with the foundational paper by Leonard Adelman in which he used DNA to solve a classic computational problem characterized by its difficulty: the Hamiltonian path problem [3]. In this problem, a network of n nodes is defined with the particularity that only certain transitions between the nodes of the network are allowed. If there exists a path in the network that allows you to go through all the nodes of the network staying only once in each node, this path is said to be Hamiltonian. During the years since, the field has grown beyond the architecture designed and developed by Adelman giving birth to a range of underlying biomolecular methods for computation [4].



(Left):Directed graph of the Hamiltonian path problem from Adelman's seminal paper [4]. (Right): DNA species implementation of the nodes and transitions for the graph.  Adapted from Parker (2003) [5].

These methods, often based on DNA, have an extraordinarily high density of information, dwarfing all the current optical/magnetic devices, as well as a very low energy consumption when compared to conventional computers and a capacity to perform astronomical numbers of operations in parallel. This means that that while in conventional computing architectures, all operations are executed sequentially doing only once at a given instant, in a biomolecular computer every single molecule in dilution is executing operations and making attempts to find a solution to the given problem at the same time.

These architectural features make DNA computers potentially efficient for crunching the solution of a set of computational challenges known as “NP problems”. For NP problems, the time for finding a solution grows exponentially with regards to the size of the problem and therefore can become impractical to solve with a conventional computer. The difficulty of solving NP problems is at the core of the security encryption protocols that are used for example in digital transactions or cryptocurrency mining. It is not hard to think that the existence of a tool that might allow those problems to be solved easily would pose a serious security breach. But despite the fact DNA computers have been designed to break cryptography protocols [6,7], they are not currently a serious threat due to a range of practical physical issues:

-The first problem comes from the fact that the information density calculations frequently cited assume essentially an essentially solid density of DNA – but such conditions are impractical for computation with any biomolecular architecture.

-Another issue with the use of massively parallel molecular computation is that only a small number of “correct” solutions may be produced in a test tube with astronomical quantities of unwanted molecules. Recovering and reading this output is an enormous challenge as it requires specifically picking these needles from the haystack using molecular interactions only.

-Related to the previous issue is the question of how to make the output readable and useful. The molecular interactions used to separate the output and read it never are completely specific. This results first in an imperfect separation of pools of molecules as well as important chance of reading as correct an incorrect output. This misreading problem becomes more prevalent when differences between correct and incorrect outputs become smaller.

-Another substantial issue to face is that most DNA-computing architectures are not reprogrammable and once the program has been run it cannot be run again.  Thus one must design new DNA strands and order those DNA strands from a supplier for a new instance of a problem, making the process of iterating an algorithm for a new set of inputs a time- and resource-consuming one. Some recent work in out-of-equilibrium chemical reaction networks aims to introduce reprogammability [8], but still is very far away from the versatility found on any personal computer.

 -The last issue that makes impractical the application of biomolecular computers for breaking cryptography protocols lies in the fact that despite algorithmic operations are easy to implement in biomolecular systems, translating that algorithmic operations into arithmetic operations is hard by itself (as proved by Winfree and Qian with their paper encoding an algorithm for calculating the square root of an integer using a 130 strands reaction network [9]).


Despite all these issues, valuable lessons can be taken. The analogy with the theoretical Turing machine underlies every conceivable computer to some extent or another, but the physical characteristics of the substrate in which the Turing machine is implemented affects deeply the efficiency of any program the machine is given. As we have seen, analogies between life’s molecular machinery and a Turing machine can be drawn and are significant. But the similarities that can be mapped between life molecular machinery and network computing are even more relevant [10,11]. Life does indeed perform certain computational tasks and it is incredibly efficient at doing so when compared to man-made computers. These tasks include inference, learning and signal processing and these remarkable capability are observed at all evolutionary levels. It is not hard to conclude that a deeper understanding of the underlying biophysics of genuine biomolecular computations would yield relevant insights improving the design and capabilities of programmable biomolecular systems that for their very own nature can directly interact with living systems and have applications in fields like synthetic biology or bio-medicine.

References
1. Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London mathematical society, 2(1), 230-265.

2. Shapiro, E., & Benenson, Y. (2006). Bringing DNA computers to life. Scientific American, 294(5), 44-51.

3. Adleman, L. M. (1994). Molecular computation of solutions to combinatorial problems. Science. 266 (5187), 1021–1024

4. Benenson, Y. (2012). Biomolecular computing systems: principles, progress and potential. Nature Reviews Genetics13(7), 455-468.

5. Parker, J. (2003). Computing with DNA. EMBO reports, 4(1), 7-10.

6. Boneh, D., Dunworth, C., & Lipton, R. J. (1995). Breaking DES using a molecular computer. DNA based computers27, 37-66.

7. Chang, W. L., Guo, M., & Ho, M. H. (2005). Fast parallel molecular algorithms for DNA-based computation: factoring integers. IEEE Transactions on Nanobioscience4(2), 149-163.

8. Del Rosso, N. V., Hews, S., Spector, L., & Derr, N. D. (2017). A Molecular Circuit Regenerator to Implement Iterative Strand Displacement Operations. Angewandte Chemie International edition129(16), 4514-4517.

9. Qian, L., & Winfree, E. (2011). Scaling up digital circuit computation with DNA strand displacement cascades. Science332(6034), 1196-1201.

10. Bray, D. (1995). Protein molecules as computational elements in living cells. Nature, 376(6538), 307-312.

11. Kim, J., Hopfield, J., & Winfree, E. (2005). Neural network computation by in vitro transcriptional circuits. In Advances in neural information processing systems (pp. 681-688).


Friday, 15 September 2017

We're famous! (ish) Tom's been answering questions on the "Naked Scientists" podcast

The Naked Scientists are based at Cambridge University's Institute of Continuing Education (ICE). In their own words, they are a team of scientists, doctors and communicators whose passion is to help the general public to understand and engage with the worlds of science, technology and medicine.

They recently contacted me to provide an answer to the following question:

"Water is solid below 0 degrees C, a liquid from 0 to 100 degrees C, and a gas above 100 degrees C. Then why does washing and the roads dry when the temperature isn't 100 degrees C?"

I think the answer's actually quite subtle, for such a deceptively simple-sounding question. My recorded answer can be downloaded from here. Full transcript below!

Interviewer: I asked Thomas Ouldridge from Imperial College London to hang Norm’s question out to dry
Tom: It is true that pure water will be a gas (called water vapour) only above 100 degrees Celsius, but temperature isn’t the only factor at play here. 
The surrounding pressure also impacts when a substance like water can be a gas – the higher the pressure, the higher the temperature required to be a gas. You've probably all heard of LPG - this stands for liquified petroleum gas. The chemicals in LPG would normally be a gas, but by keeping LPG in high-pressure cannisters it can be stored  as a liquid.
So gases, like [insert humorous reference here], often can't handle pressure. But why is this? 
To exist as a gas, water molecules have to be widely spaced out. High enough pressure will simply squeeze them back into their more compact liquid form. 
The more you heat the water, the more energy you give to the molecules and the harder they push back against their surroundings. Above 100 degrees Celsius, but not below, water molecules can push back hard enough against the pressure of the atmosphere for pure water to stay as a gas.
Well, personally I tend to hang my washing inside the earth’s atmosphere, and below boiling point, and it still dries. So what’s going on?
Well, let’s consider a puddle, for example. You might think it should stay as a liquid because the temperature is below 100 degrees, and the atmosphere is pushing down on it. However, 
we don't only have water molecules involved - the system isn't pure. The air above the surface of the puddle contains many other molecules, such as nitrogen and oxygen. 
These extra molecules can actually help to push back against the surrounding atmosphere, effectively lowering the pressure that must be supported by the water molecules if they form a gas. It’s like many people helping to lift a weight rather than just one. 
In fact, there’s so much more nitrogen and oxygen that they take almost all of the burden of the atmospheric pressure. 
This is important: any water molecules that have enough energy to escape from the puddle don’t face the full might of the atmospheric pressure – so they don’t immediately liquidise. 
This is why some water vapour can survive in the atmosphere, thanks to the hard work of the other gasses, and we can explain why evaporation happens and puddles (or clothes) dry under normal conditions.
Only a certain amount of water vapour can be supported by the other gasses though, which is why things don’t evaporate immediately, and why movement of air is important if you want things to dry faster. 
If you want to see this in action for yourself, lick your wrist and blow on it. It dries almost immediately compared to if you don’t blow.  So that’s evaporation covered, but how is that different from boiling? 
When water reaches 100 degrees at atmospheric pressure, it has so much energy that the vapour no longer needs the help of other gases to be stable. In our analogy, the water vapour is now like a weightlifter that is strong enough to support the “weight” of the atmosphere on its own.
At this point, bubbles of vapour can form within the liquid itself, converting liquid to gas much faster than slow evaporation from the liquid surface.

Thursday, 6 July 2017

What we learn from the learning rate


Cells need to sense their environment in order to survive. For example, some cells measure the concentration of food or the presence of signalling molecules. We are interested in studying the physical limits to sensing with limited resources, to understand the challenges faced by cells and to design synthetic sensors.

We have recently published a paper (arxiv version) where we explore the interpretation of a metric called the learning rate that has been used to measure the quality of a sensor (New J. Phys. 16 103024, Phys. Rev E 93 022116). Our motivation is that in this field a number of metrics (a metric is a number you can calculate from the properties of the sensor that, ideally, tells you how good the sensor is) have been applied to make some statement about the quality of sensing, or limits to sensory performance. For example, a limit of particular interest is the energy required for sensing. However, it is not always clear how to interpret these metrics. We want to find out what the learning rate means. If one sensor has a higher learning rate than another what does that tell you? 

The learning rate is defined as the rate at which changes in the sensor increase the information the sensor has about the signal. The information the sensor has about the signal is how much your uncertainty about the state of the signal is reduced by knowing the state of the sensor (this is known as the mutual information). From this definition, it seems plausible that the learning rate could be a measure of sensing quality, but it is not clear. Our approach is a test to destruction – challenge the learning rate in a variety of circumstances, and try to understand how it behaves and why

To do this we need a framework to model a general signal and sensor system. The signal hops between discrete states and the sensor also hops between discrete states in a way that follows the signal. A simple example is a cell using a surface receptor to detect the concentration of a molecule in its environment.


The figure shows such a system. The circles represent the states and the arrows represent transitions between the states. The signal is the concentration of a molecule in the cell’s environment. It can be in two states; high or low, where high is double the concentration of low. The sensor is a single cell surface receptor, which can be either unbound or bound to a molecule. Therefore, the joint system can be in four different states. The concentration jumps between its states with rates that don’t depend on the state of the sensor. The receptor becomes unbound at a constant rate and is bound at a rate proportional to the molecule concentration. 

We calculated the learning rate for several systems, including the one above, and compared it to the mutual information between the signal and the sensor. We found that in the simplest case, shown in the figure, the learning rate essentially reports the correlation between the sensor and the signal and so it is showing you the same thing as the mutual information. In more complicated systems the learning rate and mutual information show qualitatively different behaviour. This is because the learning rate actually reflects the rate at which the sensor must change in response to the signal, which is not, in general, the equivalent to the strength of correlations between the signal and sensor. Therefore, we do not think that the learning rate is useful as a general metric for the quality of a sensor.

Tuesday, 4 July 2017

Becoming more certain about uncertainty in molecular systems

By Jenny Poulton

Due to the unpredictability of motion at the microscopic scale, molecular processes have randomness associated with them, exhibiting what we call thermodynamic fluctuations. A group in Germany lead by Barato and Seifert have written a series of papers, beginning with "Thermodynamic uncertainty relation for biomolecular processes" (preprint here), exploring how uncertainty in the number of reaction steps taken by a molecular process is related to the degree to which the system is constantly consuming energy.

To be more precise, Barato and Seifert consider the number of times a system completes a cycle in a given time window. A good example of this kind of setup is the rotary motor F0F1-ATPsynthase (below, image taken from Wikipedia).
This motor is used to create the chemical fuel source of the cell (ATP) from its components (ADP and inorganic phosphate P). In order to drive this process, a current of hydrogen ions flows through the top half of the motor, causing it to systematically rotate in one direction with respect to the bottom half. This rotation is physically linked to the reaction ADP + P -> ATP, and so ATP is created. This one-directional rotational motion only arises because the current of hydrogen ions continuously supplies more energy (more technically, free energy) to the system than is needed to create the ATP. We say that the current of ions drives the system.

In general, small driven systems have a bias towards stepping forward, but there is still a non-zero probability of stepping backwards due to thermodynamic fluctuations. We also cannot predict exactly how long the system will take to complete each step of the cycle, and so the time taken per step is variable. Thus the number of cycles completed in a given time is uncertain. It is, however, possible to define an average of the net number of cycles in a time window µ and a variance σ2, which is a mathematical measure of the typical deviation from the average due to fluctuations. The Fano factor F = σ2/µ gives a measure of the relative importance of the random fluctuations about the average.

In the paper "Thermodynamic uncertainty relation for biomolecular processes", Barato and Seifert relate the energy consumption and the Fano factor via F ≤ 2kT /E. Here E is the energy consumed per cycle, T is the temperature and k is Boltzmann’s constant. This expression means that the Fano factor is at least as big as the quantity 2kT /E. Thus a cycle which uses a certain amount of fuel E has an upper limit to its precision, and there is an evident trade-off between the amount of energy dissipated per cycle and the Fano factor.

In the original paper, the authors only prove their relation for very simple processes. However, it has since been generalised in this paper (preprint here). The result is actually based on very deep statements about the types of fluctuating processes that are possible in physical systems. One of the challenges now is to take this fundamental insight and apply it to gain a better understanding of practical systems. Fortunately, the F0F1-ATPsynthase rotary motor is not the only example of an interesting biological system that  undergoes driven cycles; the cell contains a huge variety of molecular motors that can also be understood in this way (preprint here). Molecular timekeepers that are vital to the cellular life cycle also depend on driven cycles. Understanding the trade-offs between unwanted variability and energy consumption will be vital in engineering such systems.

Tuesday, 11 April 2017

Two papers on the fundamental principles of biomolecular copying

Single cells, which are essentially bags of chemicals, can achieve remarkable feats of information processing. Humans have designed computers to perform similar tasks in our everyday world. The question of whether it is possible to emulate cells and use molecular systems to perform complex computational tasks in parallel, at an extremely small scale and consuming a low amount of power, is one that has intrigued many scientists.

In collaboration with the ten Wolde group from AMOLF Amsterdam, we have just published two articles in Physical Review X and Physical Review Letters that get to the heart of this question. 

The readout molecules (orange) act as copies of the binding
state of the receptors (purple), through catalytic
phosphorylation/dephosphorylation reactions.

In the first, “The Thermodynamics of computational copying in biochemical systems”, we show that a simple molecular process occurring inside living cells - a phosphorylation/dephosphorylation cycle - is able to copy the state of one protein (for example, whether a food molecule is bound to it or not) into the chemical modification state of another protein (phosphorylated or not). This copy process can be rigorously related to those performed by conventional computers.
We thus demonstrated that living cells can perform the basic computational operation of copying a single bit of information. Moreover, our analysis revealed that these biochemical computations can occur rapidly and at a low power consumption. The article shows precisely how natural systems relate to and differ from traditional computing architectures, and provides a blueprint for building naturally-inspired synthetic copying  systems that approach the lower limits of power consumption.
The production of a persistent copy from a template.
The separation in the final state is essential.
A more complex natural copy operation is the production of polymer copies from polymer templates, as discussed in this previous post. Such processes are necessary for DNA replication, and also for the production of proteins from DNA templates via intermediate RNA molecules. For cells to function, the data in the original DNA sequence of bases must be faithfully reproduced - each copy therefore involves copying many bits of data. 

In the second article, "Fundamental costs in the production and destruction of persistent polymer copies", we consider such processes. We point out that these polymer copies must be persistent to be functional. In other words, the end result is two physically separate polymers: it would be useless to produce proteins that couldn't detach from their nucleic acid templates. As a result, the underlying principles are very different from the superficially similar process of self-assembly, in which molecules aggregate together according to specific interactions to form a well-defined structure. 

In particular, we show that the need to produce persistent copies implies that more accurate copies necessarily have a higher minimal production cost (in terms of resources consumed) than sloppier copies. This result, which is not true if the copies do not need to physically separate from their templates, sets a bound on the function of minimal self-replicating systems.

Additionally, the  results suggest that polymer copying processes that occur without external intervention (autonomously) must occur far from equilibrium. Being far from equilibrium means that processes are highly irreversible - taking a forwards step is much more likely than taking a backwards step. This finding draws a sharp distinction with self-assembling systems, that typically assemble most accurately when close to equilibrium. This difference may explain why recent years have shown an enormous growth in the successful design of self-assembling molecular systems, but autonomous synthetic systems that produce persistent copies through chemical means have yet to be constructed.
Taken together, these papers set a theoretical background on which to base the design of synthetic molecular systems that achieve computational processes such as copying and information transmission. The next challenge is now to develop experimental systems that exploit these ideas.

Monday, 3 April 2017

Working with the City of London School on an exciting iGEM project

Today I meet with a group of school students (aged 16-18) from the City of London School, who will be working on a project for iGEM this year. iGEM is an international competition for school, undergrad and postgrad teams to design, model and build complex systems by engineering cells. Last year, Imperial won the overall prize, as discussed in this post by Ismael. 

Without giving too much away, the students will be working on a system based on a newly-developed molecular device, the toehold switch. Toehold switches are RNA molecules that contain the information required to produce proteins. This information is hidden via interactions within the RNA, which cause it to fold up into a shape that prevents the sequence from being accessed. If, however, a second strand of RNA with the right sequence is present, the structure can be opened up and protein production is possible.

This idea has been around for a reasonable while, but toehold switches are particularly useful, because they provide a better decoupling of the input, output and internal operation of the switch than previous designs. This is the principal of modularity that underlies the work of many of my colleagues here at Imperial, and allows for systematic engineering of molecular systems. This modularity is key to the proposed project.

I've been giving the students advice on how to model the operation of a toehold switch, in order that they can explore the design space before getting into the lab.


Wednesday, 11 January 2017

A simple biomolecular machine for exploiting information



Biological systems at many scales exploit information to extract energy from their environment. In chemotaxis, single-celled organisms use the location of food molecules to navigate their way to more food; humans use the fact that food is typically found in the cafeteria. Although the general idea is clear, the fundamental physical connection between information and energy is not yet well-understood. In particular, whilst energy is inherently physical, information appears to be an abstract concept, and relating the two consistently is challenging. To overcome this problem, we have designed two microscopic machines that can be assembled out of naturally-occurring biological molecules and exploit information in the environment to charge a chemical battery. The work has just been published as an Editor's selection in Physical Review Letters: http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.118.028101

The basic idea behind the machines is simple, and makes use of pre-existing biology. We employ an enzyme that can take a small phosphate group (one phosphorus and several oxygen atoms bound together) from one molecule and attach it to another – a process known as phosphorylation. Phosphorylation is the principal signaling mechanism within a cell, as enzymes called kinases use phosphyrlation to activate other proteins. In addition to signalling, phosphates are one of the cell’s main stores of energy; chains of phosphate bonds in ATP (the cell’s fuel molecule) act as batteries. By ‘recharging’ ATP through phosphorylation, we store energy in a useful format; this is effectively what mitochondria do via a long series of biochemical reactions.





Fig 1.: The ATP molecule (top) and ADP molecule (bottom). Adenosine (the "A") is the group of atoms on the right of the pictures; the phosphates (the P) are the basic units that form the chains on the left. In ADP (Adenosinediphosphate) there are two phosphates in the chain; in ATP((Adenosinetriphosphate) there are three. 


The machines we consider have three main components: the enzyme, the ‘food’ molecule that acts as a source of phosphates to charge ATP, and an activator for the enzyme, all of which are sitting in a solution of ATP and its dephosphorylated form ADP. Food molecules can either be charged (i.e. have a phosphate attached) or uncharged (without phosphate). When the enzyme is bound to an activator, it allows transfer of a phosphate from a charged food molecule to an ADP, resulting in an uncharged food molecule and ATP. The reverse reaction is also possible.

In order to systematically store energy in ATP, we want to activate the enzyme when a charged food molecule is nearby. This is possible if we have an excess of charged food molecules, or if charged food molecules are usually located near activators. In the second case, we're making use of information: the presence of an activator is informative about the possible presence of a charged food molecule. This is a very simple analogue of the way that cells and humans use information as outlined above. Indeed, mathematically, the 'mutual information' between the food and activator molecules is simply how well the presence of an activator indicates the presence of a charged food molecule. This mutual information  acts as an additional power supply that we can use to charge our ATP-batteries. We analyse the behaviour of our machines in environments containing information, and find that they can indeed exploit this information, or expend chemical energy in order to generate more information. By using well-known and simple components in our device, we are able to demystify much of the confusion over the connection between abstract information and physical energy.

A nice feature of our designs is that they are completely free-running, or autonomous. Like living systems, they can operate without any external manipulation, happily converting between chemical energy and information on its own. There’s still a lot to do on this subject; we have only analysed the simplest kind of information structure possible and have yet to look at more complex spatial or temporal correlations. In addition, our system doesn’t learn, but relies on ‘hard-coded’ knowledge about the relation between food and activators. It would be very interesting to see how machines that can learn and harness more complex correlation structures would behave.

Authored by Tom McGrath