ANC User Manual 1V00

From Allosteric Network Compiler
Jump to: navigation, search

Contents

ANC User's Guide

Front Matter

Author: Julien Ollivier

Copyright (c) 2007-2011

This ANC User's Guide, Reference Manual, and Programmer's Reference manual apply to the most recent ANC release in the version 1.x series.

Overview

Most biochemical networks contain proteins whose function is subject to regulation by various other molecules, such as drugs, 2nd messengers, covalent modifications and other proteins. One common form of regulation is when one molecule, by interacting with a specific binding site of a protein, regulates the strength of the protein’s interaction with another molecule at a distinct, or "allosteric", site. For example, binding of an extracellular agonist by a G protein-coupled receptor can increase its affinity for intracellular ligands such as G proteins, even though the two binding sites are physically distinct. Allostery is ubiquitous in biochemical networks and many membrane receptors, ion channels, enzymes and transcription factors are allosteric proteins.

The Allosteric Network Compiler (ANC) is a rule-based modelling tool for biochemical networks that comprise allosteric proteins. Such networks include, for example, signal transduction pathways and gene regulatory pathways. Using ANC, one can model the structure of allosteric proteins at a level of detail that accounts for their functional properties. Simple components are used as lego-like building blocks for more complex allosteric structures. You can delineate subunits and domains, binding and phosphorylation sites, and can model regulatory linkages between subunits and components. Modelling is both modular and hierarchical and so simple structures can used to construct larger, more complex structures. The figure below gives an overview of how models are created and simulated with ANC.

The Allosteric Network Compiler (ANC). ANC is a rule-based modelling tool for allosteric proteins and biochemical networks. A modeller describes a biochemical network using structures and rules. Structures (panel A) describe each biomolecule in terms of its components, which can be interaction sites (circles), or domains and subunits (triangles). Structures also describe a protein's structural and containment hierarchy (arrows), conformational dynamics, and internal allosteric couplings (dashed lines). Rules (panel B) specify the possible interactions of biomolecular components and provide relevant biochemical parameters. An ANC model is a text file that comprises structures and rules but also defines initial conditions, network stimulus protocols, and readouts. The model is compiled by ANC to produce a biochemical reaction network (panel C). Using an iterative algorithm, ANC generates biochemical reactions through the repeated application of rules to the biomolecules in the system. Once the complete reaction network is compiled, ANC writes the complete biochemical reaction network to a file in a human-readable, textual format. ANC can also create graphical representations of the reaction network and of the biomolecular species in the network to facilitate inspection of the results (not shown). Finally, the network file is read and exported by the Facile biochemical network compiler to a form suitable for deterministic or stochastic simulation using Matlab and other tools.

Underpinning ANC is a thermodynamically-grounded modelling framework that describes allostery and cooperativity through transitions between discrete conformational states of proteins. This modelling framework enables ANC to generate allosteric transitions between the conformational states of proteins during compilation, and to compute the allosteric transition rates for any combination of regulatory inputs -- called modifiers -- applied to the protein. Borrowing from classic allosteric theory, the framework assumes that ligands and other molecules interact non-cooperatively (i.e. independently) with each conformational state of a protein. Nevertheless, regulation arises indirectly because each regulatory input biases the equilibrium between the conformational states in favour of the state to which it has higher affinity, which in turn affects the interactions of the protein with other molecules if these discriminate between conformations.

The assumption of independence built into ANC’s thermodynamic framework can help biologists tackle the problem of regulatory complexity. Regulatory complexity arises in biochemical networks when an allosteric protein has multiple possible ligands or other regulatory interactions. In principle, each combination of regulatory inputs can uniquely determine the strength of a protein’s interactions with downstream pathways. Thus, the number of parameters required to model allosteric and cooperative effects in networks grows combinatorially with the number of interactions and can be very large. However, if modifiers such as ligands and covalent modifications are assumed to interact independently with a small number of discrete conformational states, this combinatorial growth in the dimensionality of parameter space becomes a linear growth. By reducing the number of parameters, this framework also facilitates the creation of biochemical models that have more predictive power.

Through its rule-based modelling approach, ANC also addresses the problem of combinatorial complexity. In biochemical networks, proteins dynamically bind and form complexes with ligands or other proteins, and are often subject to covalent modifications that regulate these interactions. As the number of such interactions increases, there can arise a combinatorial "explosion" in the number of possible ligation or modification states of the proteins and macromolecular complexes in the system, which challenges our ability to write down a realistic biochemical model. Fortunately, such systems can often be described with a small number of rules, from which ANC can automatically infer the existence of a much larger set of biochemical species and reactions.

ANC is more than a modelling tool -- indeed, it embodies a methodogical framework, which is based on well-established allosteric theory and biophysical principles, and a modular language for describing allosteric proteins and networks. In the process of creating an ANC model, your understanding of the relationship between the structure and function of a protein can improve significantly. ANC models are also straightforward to construct, re-use, extend, and update, making it easy to explore different hypotheses.

References:

  1. Olliver JF, Shahrezaei V, Swain PS. "Scalable rule-based modelling of allosteric proteins and biochemical networks." PLoS Comput Biol. 2010 Nov 4;6(11):e1000975.
  2. Siso-Nadal F, Ollivier JF, Swain PS. "Facile: a command-line network compiler for systems biology.". BMC Syst Biol. 2007 Aug 3;1:36.

Feature Summary

ANC has the following general features:

Modelling

  • embodies a rule-based modelling methodology
  • describes the hierarchical structure of proteins and macromolecules
  • comprises a thermodynamic framework for describing allosteric proteins
  • allows full parametrization of models to avoid unnecessary recompilations
  • supports modelling of co-localized protein interactions (e.g. those occurring after recruitment by adapters or scaffolds)

Input

  • object-oriented input language for component and rules

Output

  • generates textual, human-readable output of compiled biochemical reactions
  • creates graphs of species' structure and reaction network
  • exports to SBML, Matlab, XPP, Mathematica, Maple and EasyStoch via Facile

Implementation

  • is written in Perl, can run on most platforms
  • has an object-oriented architecture

Getting Started

General Support

If you have any questions or need help using ANC and creating models, only simply have feedback, please contact julien.ollivier AT gmail.com.

For help on installation, please ask your system's administrator or someone knowledgeable. If this does not resolve the problem, feel free to contact us.

Installation

Systems administrator privileges may be required to install some of the components below, in particular the CPAN Perl modules.

Ask your systems administrator for help if you experience difficulties with the installation.

Facile

ANC relies on the Facile network compiler to export compiled networks to a form suitable for simulation in Matlab or other tools.

The download site, user manual and installation instructions for Facile is found at http://facile.sourceforge.net

UNIX/OSX

ANC is straightforward to install and use on any UNIX-like platform that provides the Perl interpreter (version 5.8 or higher), a suitable shell (e.g. bash), and standard UNIX tools and commands, such as tar, make and gcc. The instructions below assume such a setup.

On OS-X, you will need to install the Xcode tools which provide these utilities. Xcode is supplied with the OSX installation disks or available on the following website: http://developer.apple.com/tools/xcode.

Windows

On Windows XP, the easiest solution to provide UNIX compatibility is to install Cygwin, which provides a bash shell and many standard UNIX commands and applications. When installing, include the make, gcc, tar and perl packages. Once Cygwin is running, ANC can be installed by following the instructions below for UNIX environments.

On Windows 7 (64-bit), we have found that installation of the 'gcc4' Cygwin package was also required to correctly install CPAN modules (see below). If the gcc compiler reports errors about unknown compiler switches, installing gcc4 is likely to resolve the problem.

Windows Vista may also require gcc4 but we have not had the opportunity to test this.

Download/Unpack ANC

The ANC installation tarball can be downloaded from http://anc.sourceforge.net.

Having downloaded ANC, the next step is to select an installation directory. Let's assume this is the directory ~/toolbox. Then execute the following commands:

mkdir -p ~/toolbox
cd ~/toolbox
tar -pxvzf <download_path>/anc_RELEASE_1Vxx.tar.gz

Environment Setup

The ANC_HOME environment variables should point to the installation directory, and an alias for ANC should be created. It is recommended to add the following lines to your ~/.bashrc file:

export ANC_HOME=~toolbox/anc_RELEASE_1Vxx
alias  anc='$ANC_HOME/anc.pl'

You will have to start a new shell for these changes to take effect.

The rest of this manual assumes that anc.pl is aliased as anc.

CPAN modules

CPAN is an internet database of Perl modules. ANC and Facile use several of them and they must be installed prior to use. You will need system administrator priviledges to install these modules. You should run the following commands on each system used:

sudo cpan -i Class::Std
sudo cpan -i Class::Std::Storable
sudo cpan -i WeakRef
sudo cpan -i String::CRC32

Again, this step may require installation of the Cygwin gcc4 package on some versions of Windows.

GraphViz

If you would like ANC to generate diagrams of the reaction network and species, you must install the GraphViz package on your system (http://www.graphviz.org). Executing the command

dot -V

will tell you whether GraphViz is already installed. If not, refer to the GraphViz website for installation instructions. Installation is usually straightforward; for example, on Ubuntu or Debian you can use the apt-get command:

apt-get install graphviz

Regardless of which operating system you use, the following CPAN module is also required to use the GraphViz-related features:

sudo cpan -i GraphViz

Test installation

Test your installation by running Facile and ANC without any arguments:

$FACILE_HOME/facile.pl
$ANC_HOME/anc.pl

An error will be reported if any of the required modules are still missing. Simply run CPAN again to install the missing module.

You can also test the aliases defined above by typing anc and facile.

Creating a Model

Any text file editor is adequate to create ANC models. To get started, first create a working directory somewhere then copy the example generic adaptor model file (corresponding to the adaptor of Figure 1 of the ANC paper) to the working directory. All examples are distributed as read-only files, and so need to be made writeable. You can then open your preferred text editor (e.g. xemacs) to edit the model as desired:

mkdir ~/adaptor
cd ~/adaptor
cp $ANC_HOME/examples/adaptor/adaptor_generic.mod .
chmod +w adaptor_generic.mod
xemacs adaptor_generic.mod

Compiling and Simulating

Compilation is straightforward (omit the --graphviz option if you didn't install :

anc --graphviz=canonical adaptor_generic.mod

This generates the file adaptor_generic.eqn. To generate scripts for simulation in Matlab, we use the Facile program:

facile -m -P adaptor_generic.eqn

This generates several Matlab .m files, including the adaptor_genericDriver.m and adaptor_generic.m files which contain a script to run the simulation and the system ODEs, respectively. Assuming Matlab is installed, we can simulate the system as follows:

matlab -r "run adaptor_genericDriver";

Finally, the following Matlab command plots the concentration of ligated adaptor proteins (i.e. AY) against time (n.b. this will only work if you used Facile’s -P option as indicated above):

plot (t, RESPONSE);

System Modelling

Key Concepts

In ANC's rule-based modelling methodology, a model comprises both structures and rules.

The modeller creates structures to describe each biomolecule in the system (e.g. receptors, enzymes, ligands, 2nd messengers, etc.). Each ANC-structure comprises a number of discrete components, which may themselves be structures. Components can represent interaction sites such as the binding, catalytic or phosphorylation sites of a protein. If the component is a nested structure, then it represents a unit of structural hierarchy such as a domain or subunit and may in turn contains other components. Structures model both individual biochemical entities such proteins, ligands or 2nd messengers, or macro-molecular assemblies such as receptor-agonist complexes, protein dimers, trimers, etc.

When defining the structures which describe a system, the modeller specifies the possible states of each structure. Once this is done, the structures must be instantiated. A structure instance is a copy of a structure but where state information has been attached to the structure and its components. Structure instances are created when the model is initialized prior to compilation. During compilation, new structures and structure instances are created as needed to represent products of the biochemical reactions implied by the model's rules. Generally speaking, structures embody the static, non-changing attributes of the various biomolecules in the model which are common to all instances, while instances capture the state information which may change with time as a result of a reaction.

A structure is represented with a graph, i.e. a set of nodes connected by a set of edges. Each node corresponds to a component of the structure. The edges of the structure represent relationships or interactions between the components such as containment, allosteric coupling, and biochemical bonds. Nodes may also have associated state information (e.g. phosphorylation or allosteric state). A graph is a useful representation for biomolecules because it can capture the connectivity and interactions between molecules in a macro-molecular complex. This is important because during the compilation of a reaction network, each newly created species must be compared to a list of existing species to avoid duplications. Indeed, a graph representation of the new species allows the compiler to determine if it is identical to one of the existing species: if the two graphs are isomorphic, then the new product and the existing species are identical.

Rules describe the interactions between the components of proteins. For example, given a protein A with two binding sites, Ax and Ay, and two ligands X and Y, a binding rule may specify that the component Ay binds Y but only if the protein A is in a specific state of binding, of covalent modification, or in a specific conformational state. We refer to such conditions as "regulatory logic" because they typically describe how the binding of one molecule directly or indirectly affects the ability of a protein to bind another molecule. Similarly, a modification rule may specify that a catalytic component of one protein may covalently modify the state of a protein component elsewhere in the network, again with associated regulatory logic. Binding and modification rules cause reversible binding and Michaelis-Menten enzymatic reactions to be generated during compilation of the model, following a pre-defined reaction template.

In contrast to binding and catalytic reactions, allosteric transitions are not associated with user-defined rules. Instead, the rules for conformational changes are pre-defined by the thermodynamic framework implemented by ANC. To model an allosteric protein, the user designates one or more components of a protein as allosteric, and gives the conformational states and transition rates of each component in its unmodified state. The thermodynamic framework implemented by ANC then allows calculation of the allosteric transition rates when the component is in a modified (e.g. bound or phosphorylated) state, and the corresponding allosteric reactions are generated during the compilation process.

ANC models are input in textual form using an object-oriented modelling language. Here, all elements of a model (e.g. structures, rules and initial conditions), are represented by an object, and each object is an instance of its class. An object's class defines a set of named attributes and methods (i.e. functions) for all objects of the class. However, each object's attributes can have a different value. Class methods implement the object's behaviours and allow the modeller to control and communicate with the object. In object-oriented languages, a class can inherit the attributes and methods of a parent (or base) class. Such sub-classes are said to bederived from the parent class. Derived classes often define additional attributes and methods which build on the functionality of the base class but provide more specialized capabilities. Conceptually, there is usually an "is a" relationship between a derived class and the parent class, e.g. a Mammal sub-class could inherit from a Verterbrate class because a Mammal "is a" Verterbrate.

In this section, we will introduce the language constructs used to create a complete ANC model. For each class, we illustrate how to create an object, discuss its attributes, and also introduce a graphical representation for nodes, structures, and rules.

Nodes

All ANC-structures comprise one or more components which are derived a generic Node class. A generic Node object, such as the one defined below, can be used as a placeholder for a component which will be added to the model at a later time, but is not otherwise useful.

Example definition:

Node: {
	name => "LBS",    # future ligand-binding site
}

To allow structures to be nested, a special type of node exists called a hierarchical (or group) nodes. Such nodes are associated with a particular level of biological structure (e.g. tertiary structure). For example, the structure for a 2-domain protein would contain one hierarchical node associated with each domain, and one all-encompassing hierarchical node representing the entire protein. These hierarchical nodes are distinguished from other nodes through the value of the group_node_flag attribute and are represented graphically by a triangle instead of a circle.

Graphical representation:

ANCManual Node.png

Reaction Sites

Reaction sites, which represent the interaction sites of biomolecules, are derived from the Node class and therefore inherit its attributes. Reaction sites interact through reversible binding or Michaelis-Menten style enzymatic reactions. ANC requires the user to specify one of 3 sub-types of reaction sites:

  • a binding site, bsite which reversibly bind with another site
  • a catalytic site, csite, which can change the state of modification sites
  • a modification site, msite, which carries state information (e.g. phosphorylation)

Example definition:

ReactionSite: {
	name => "Tyr",
	type => "msite",
}

Graphical representation:

ANCManual ReactionSite.png

Any instance of an msite has associated state information. Allowed states are either '0' or '1', with '1' usually representing a state of covalent modification. In the graphical representation above, the solid filled circle represent '1', while an open circle (not shown) represents '0'.

Allosteric Sites

An allosteric site is the component used to model allosteric transitions in proteins. This class of nodes carries conformational state information and is allosterically coupled to other nodes in the protein. It also functions as the hierarchical node of a structure. In the definition below, the reference conformational state, here I, is listed first and the non-reference state, A, follows. Similarly, the rate constant for the allosteric transition from the reference state to the non-reference state is given first, followed by the rate for the transition from the non-reference to the reference state. Finally, Phi is a parameter that relates changes in the equilibrium constant of the allosteric transition due to the presence of modifiers to changes in the kinetic rates.

Example definition:

AllostericSite: {
	name => H, 
	allosteric_state_labels => ['I', 'A'],      # inactive, active
	allosteric_transition_rates => [0.5, 100],
	Phi => 0.5,
}

Graphical representation:

ANCManual AllostericSite.png


Structures

Structures comprise a set of components. To define a structure, the user supplies a list of previously defined components which the structure is to contain. These components can be interaction sites or other structures, so that structures can be hierarchically nested one within the other. We can define simple structures, which comprise only comprise only interaction sites, and hierarchical structures, which have nested hierarchical nodes.

Simple Structures

Simple structures are composed of nodes only. The following is an definition of a hypothetical structure whose components are the previously defined nodes N1 and N2. Notice how we do not need to explicitly create a hierarchical node for S0: ANC automatically creates a hierarchical node for the structure being defined and connects the hierarchical node to the components given in the elements list via uni-directional edges.

Example definition:

Structure: {
	name => S0,
	elements => [N1, N2],
}

Graphical representation:

ANCManual SimpleStructure.png

Hierarchical Structures

Hierarchical structures contain other structures. Any level of nesting is possible, though typically only a small number structural levels will be used. For instance, in a model of a protein, the top-level structure might correspond to the quaternary structure of the protein, containing other structures representing tertiary-level structures such as domains or sub-units. At the macro-molecular level, the creation of protein complexes leads to an additional level of structure. In the following definition, the top-level structure T0 contains the structures S0 and S1.

Example definition:

Structure: {
	name => S0,
	elements => [N1, N2],
}
Structure: {
	name => S1,
	elements => [N3, N2],
}
Structure: {
	name => T0,
	elements => [S0, S1],
}

Graphical representation:

ANCManual HierarchicalStructure.png

Allosteric Structures

Allosteric structures are structures whose hierarchical component is an AllostericSite. The components listed as elements of the structure are allosterically coupled to the hierarchical component (as shown by the dashed line edges in the graph of the structure below). This has two effects. First, when other components interact with the elements of the allosteric structure, they can "see" the allosteric state of the structure, and the strength of this interaction may depend on the allosteric state. Any such difference will in turn, through the laws of thermodynamics, influence the allosteric equilibrium and transition dynamics of the allosteric site.

In the example below, the structure Hb has an allosteric hierarchical component (also called Hb), which is coupled to 4 equivalent ligand-binding nodes LB. The allosteric transition rates are parameterized.

Example definition:

# From file: $ANC_HOME/examples/multimers/hemoglobin_QTS.mod

#-----------------------------------------------------
# HEMOGLOBIN
#-----------------------------------------------------
ReactionSite: {
        name => "LB",    # ligand binding site
        type => "bsite",
}

AllostericStructure: {name => Hb,
        elements => [LB,LB,LB,LB],
        allosteric_transition_rates => [k_RT, k_TR],
        allosteric_state_labels => ['R','T'],
        Phi => RT_phi,
}

Graphical representation:

ANCManual AllostericStructure.png


Molecules vs Complexes

A molecule is a the complete hierarchy of structures used to model a biochemical species such as a protein or ligand. Therefore any structure which is not included in another structure is treated as a molecule. Molecules always have a top-level grouping node that joins their elements into a connected graph. Molecules are used as templates used to create the starting species in the system via structure instantiation.

Complexes are structures which consist of one or more molecules joined by reversible non-covalent bonds. In other words, they are macromolecular assemblies of structures such as a receptor-ligand complex. Complexes don't have a top-level hierarchical node because, however their component molecules have a grouping node.

In general, complexes are automatically created during intialization and compilation of the model, though the modeller may also create them explicitely. The example below shows a complex formed from the binding of a simple ligand with an allosteric divalent protein.

Graphical representation of a complex:

ANCManual Complex.png


Rules

So far we have seen how to describe the biomolecules in a system using structures. Here, we show how these structures can be made to interact via rules. Rules specify which structure components interact, the rates of this interaction, and the required state of the components.

Binding example

The example below shows how to implement the binding of a ligand to a receptor and the dimerization of the receptor using the CanBindRule object. First, the receptor and ligand structures are defined. Then, using the ligand_names rule attribute, we list the pair of reaction sites that bind. The reaction site L is a component of the ligand's structure while the reaction sites LB and D are components of the receptor. The rule attributes kf and kb are the kinetic rate constants of the interaction. Note that this example does not include any kind of regulation -- binding and dimerization occur regardless of the state of the receptor.

#-----------------------------------------------------
# Receptor
#-----------------------------------------------------
ReactionSite : {
  name => "LB",   # ligand-binding site
  type => "bsite",
}

ReactionSite : {
  name => "D",  # dimerization site
  type => "bsite",
}

Structure : {
  name => "R",
  elements => ["LB", "D"],
}

#-----------------------------------------------------
# Ligand
#-----------------------------------------------------
ReactionSite : {
  name => "L",
  type => "bsite",
}

Structure : {
  name => "L",
  elements => ["L"],
}

#-----------------------------------------------------
# Rules
#-----------------------------------------------------

# ligand-receptor binding
CanBindRule : {
  ligand_names => ['L', 'LB'], 
  kf => 10.0, 
  kb => 1.0,
}

# receptor dimerization
CanBindRule : {
  ligand_names => ['D', 'D'],
  kf => 0.01,
  kb => 20.0,
}

Here is the graphical representation of the above model. Note that the rate constants have not been annotated.

ANCManual BindingRule.png

Covalent modification example:

This example shows how to implement the covalent modification of a substrate. Here, the component K1 is the active site of a kinase, the component P1 is the active site of a phosphatase, and M1 is the phosphorylation site of the substrate. The ligand_msite_states attribute specifies the modification state of M1 required for the kinase and phosphatase to bind. Therefore, the first rule below says that the kinase component K1 only binds to unphosphorylated M1. Giving a period '.' as the desired state indicates either "not applicable" or "don't care". The rule attributes kf, kb, and kp are the association, dissociation and product formation rates.

CanBindRule : {
  ligand_names => ['K1', 'M1'],
  ligand_msite_states => ['.', '0'],
  kf => 10.0, 
  kb => 1.0,
  kp => 2.0,
}
CanBindRule : {
  ligand_names => ['P1', 'M1'],
  ligand_msite_states => ['.', '1'],
  kf => 5.0, 
  kb => 2.0,
  kp => 3.0,
}

Here is the graphical representation of the above two rules. Note that the kinetic rate constants are not shown.

ANCManual ModificationRule.png

Rules for Intra-Complex Reactions

Binding and catalytic rules may give rise to reactions in which either the reacting components are in two dissociated biomolecules, or in which the reacting components were already part of the same complex. If necessary, the modeller must include additional constraints the CanBindRule object definition to restrict the application of the rule to an inter-complex (external) reaction, or an intra-complex (internal) reaction. This is done through the association_constraints attribute of a CanBindRule. Through this attribute, the modeller can supply a list of ad hoc conditions which a binding reaction must satisfy to be generated by ANC. Here, we make ue of a special $internal_flag variable which is set by ANC when the rule is being processed. When this flag is true, then the reaction is an intra-complex reaction.

CanBindRule : {
	ligand_names => ['X', 'Xn'], 
	kf => 1.0, 
	kb => 0.1,
	steric_factor => 1000,
	association_constraints => [
		# has to be an internal reaction
		'$internal_flag == 1',
	],
}

Notice also the use of the steric_factor attribute in the above example. This arises because internal reactions are always 1st-order reactions, regardless of whether they are binary or unary reactions (i.e. involving one or two reaction sites). However, the rate constant kf specified in a CanBindRule for the binding reaction template is by definition a 2nd-order reaction rate constant. Clearly, a factor is required to convert the 2nd-order rate constant, which has dimensions of M-1s-1, to a 1st-order rate constant which has dimensions of s-1. This steric_factor is specified in the rule and will be applied to internal reactions generated by the template. The steric_factor has dimensions of M-1 and can be thought of as the effective concentration at which the reactants "see" each other. Please refer to the reference manual for more information.

Allostery

Allostery allows biomolecules to regulate the activity of a protein at a site other than that to which the regulatory biomolecule binds. Clearly, the generic model of receptor dimerization given above does not include any notion of allostery. You may have noticed from the relevant rate constants that the receptor dimerizes very poorly. In this section, we explain how to include allosteric effects in the model such that the receptor will dimerize strongly but only when it binds a ligand.

ANC supports two ways of doing this. The first way is to define allosteric structures with distinct conformations and regulatory biomolecules or modifiers that modulate the equilibrium between conformations. We refer to this approach as "MWC" allostery, because Monod, Wyman and Changeux first proposed the underlying model. In the alternate approach, which we call ad hoc allostery, the user defines system-specific contextual constraints in rules that specify when the state of one regulatory site influences the activity of another.

Either method should allow the user to model most allosteric mechanisms found in real biochemical networks.

MWC Allostery

In the MWC-based method, allostery is modelled by creating allosteric structures and interaction rules which discriminate between the conformational states of a structure. Each allosteric structure contains one or more allosteric sites, which have one of 2 conformational states. These states are internally designated as 'R' (the reference state) and 'T' (the non-reference state), though the user may define and an alternate label.

Allosteric sites individually (but not necessarily independently) transition between conformational states at a rate specified by the user. During compilation of the biochemical network, ANC generates the biochemical equations corresponding to these conformational changes. In the simplest case, an allosteric structure contains only one allosteric site and so a pair of biochemical equations is generated for the transition between two conformational states. In structures containing multiple allosteric sites, each allosteric site will independently transition between conformational states, giving rise to a multiplicity of conformational states which are a source of combinatorial complexity.

Each allosteric site of a structure is generally coupled to one or more other components in the structure. When coupled, these other components or their ligands can "see" the conformational state of the allosteric site. If the coupled component is an interaction site, it inherits the allosteric state of the allosteric site, and ligands which bind to the allosteric site can discriminate between the two conformations by binding more strongly to one conformation than the other. If the interaction site is a covalent modification site, then modification of the msite may create a covalent bond with more free energy in one conformation than the other. Likewise, if the coupled component is another allosteric site, then it may also interact more strongly with one conformation than the other.

Conversly, the allosteric site can also discriminate and be affected by the different states of the components which are coupled to it. The ligands bound to an interaction site, the state of an msite, and the state of another allosteric site will modulate both the conformational equilibrium and kinetic rates of the allosteric transition. Below, we explain how each of these types of modifiers affect the allosteric transition in more detail.

Effect of Binding

We first consider the case of a monovalent allosteric structure A which binds a ligand X. The model is given in its graphical form below. Note that not all of the model's parameters are shown in the illustration.

ANCManual LigandEffect.png

The textual form of this model shows its parameterization in more detail. Here, we supply rate constants for the association and dissociation of the ligand to each conformational state of the protein A, the allosteric transition rates for the unligated form of the protein, and also a parameter ΦAX, whose meaning we shall explain below.

File: $ANC_HOME/examples/manual/ligand_effect.mod

#-----------------------------------------------------
# MONOVALENT PROTEIN
#-----------------------------------------------------
ReactionSite: {
        name => "AX",
        type => "bsite",
}
AllostericStructure: {
        name => A, 
        elements => [AX],
        allosteric_transition_rates => [kf_RS, kb_RS],
        allosteric_state_labels => ['R','S'],
        Phi => Phi_AX,
}

#-----------------------------------------------------
# LIGAND X
#-----------------------------------------------------
ReactionSite : {
        name => "X",
        type => "bsite",
}
Structure: {name => X, elements => [X]}

#-----------------------------------------------------
# RULES
#-----------------------------------------------------
CanBindRule : {
        ligand_names => ['X', 'AX'],
        ligand_allosteric_labels => ['.', 'R'],
        kf => kf_RX,
        kb => kb_RX,
}

CanBindRule : {
        ligand_names => ['X', 'AX'],
        ligand_allosteric_labels => ['.', 'S'],
        kf => kf_RS,
        kb => kb_RS,
}

This model is compiled by ANC to give 8 biochemical equations, as illustrated in this diagram:

ANCManual AllostericCycle.png

Notice that the equations form a cycle of reversible transitions between 4 different states of the protein. In this cycle, most of the transition rates are directly supplied in the model, except for the allosteric transitions between the ligated states of A. Thus, ANC must have some way to compute these transition rates. Fortunately, this kind of cycle is well-known: it is a thermodynamic cycle, and has the special property that if no energy is consumed then the product of each transition's equilibrium constants must be unity. This fact allows the calculation of the equilibrium constant of the ligated allosteric transition. Using uppercase K to denote equilibrium constants and with the convention that K = kf/kb, we have:

K_{RS}'= K_{RS} {K_{SX} \over K_{RX}} = K_{RS} \Gamma_X\,

We can see from the above equation that the ligand X has an effect on the allosteric equilibrium constant of protein if it has a different binding affinity when the domain in the R versus S states. Indeed, the allosteric equilibrium between R and S conformations is changed by an amount equal to the ratio of the binding affinities of X to each conformational state. Furthermore, notice that the allosteric equilibrium is changed in favour of the conformation to which the ligand binds with higher affinity. For example, if KRS is 0.01 (i.e. R is more stable) before binding the ligand X and X binds 1000 times more strongly to S than R, then the new allosteric equilibrium constant after binding X will be 10.

What about the two allosteric rate constants for the transition in the ligated state? Since we know only their ratio, the fold-change in the allosteric equilibrium constant induced by a ligand with differential affinity to each conformation of the protein can be arbitrarily applied to either the forward allosteric transition rate or the backward transition rate, or some combination of both. In ANC, a parameter Φ (Phi) determines how the fold-change of the allosteric equilibrium is distributed over the allosteric rates. To understand how this works, we first write the above equation in terms of the associated kinetic rates:

{kf_{RS}' \over kb_{RS}'} = {kf_{RS} \over kb_{RS}} \Gamma_X\,

Since this single equation has two unknowns, we must fix the value of one parameter, say kfRS', in order to compute the other one. Rather than fix kfRS' directly however, we will express it in terms of a new parameter ΦAX, which we fix instead:

kf_{RS}' = kf_{RS} (\Gamma_X)^{\Phi_{AX}}\,


Now that we know kfRS', we can compute kbRS':

kb_{RS}' = kb_{RS} (\Gamma_X)^{(\Phi_{AX}-1)}\,

We can now see how Phi determines how the fold-change is factored and applied to each rate. If Phi is 1, the fold-change is applied entirely to the R->S rate. If Phi is 0, the fold-change is applied to the S->R transition rate. If Phi is 0.5, then the square root of the fold-change is applied to each rate. So if Phi = 0.5 and the fold-change is 1000, the a factor of sqrt(1000)~31.62 is applied to each rate: the forward allosteric transition rate is multiplied by 31.62 while the backward rate is divided by the same amount.

Effect of Modification

An msite contained in an allosteric structure can also alter the equilibrium of the allosteric transition. It can do this both by binding a ligand (as explained above), but also by changing its state from 0 to 1. When the state of the msite changes to 1, a user-specified regulatory factorT) gives the resulting fold-change in the allosteric equilibrium due to the modification at site T. As before, the fold-change of the allosteric equilibrium is distributed over the new allosteric transition rates according to a parameter ΦT. Thus, the allosteric transition rates of the modified structure are calculating according to the following formulae:

kf_{RS}' = kf_{RS} (\Gamma_T)^{\Phi_T}\,

kb_{RS}' = kb_{RS} (\Gamma_T)^{(\Phi_T-1)}\,


Dividing the above two equations gives the expected relation between the unmodified and modified structure's allosteric equilibrium:

K_{RS}' = K_{RS} \Gamma_T\,


We illustrate this with the model shown below, in which a kinase K changes the allosteric equilibrium of a protein A by phosphorylating A's msite. In the graphical form of the model, the regulatory factor of the msite, namely ΓT, and the Φ-value are associated with the dashed line that represents the allosteric coupling between components. In the textual form of the model, the regulatory factor is specified via the reg_factor attribute of the structure. Both forms of the model give one phosphorylation rule for each associated conformational state, with each rule having distinct parameter values. First, the graphical form of the model:

ANCManual ModificationEffect.png

The same model in the textual form:

File: $ANC_HOME/examples/manual/modification_effect.mod

#-----------------------------------------------------
# ALLOSTERIC SUBSTRATE
#-----------------------------------------------------
ReactionSite: {
	name => "T",
	type => "msite",
}
AllostericStructure: {
	name => A, 
	elements => [T],
	allosteric_transition_rates => [kf_RS, kb_RS],
	allosteric_state_labels => ['R','S'],
	reg_factors => [Gamma_T],
	Phi => Phi_T,
}

#-----------------------------------------------------
# KINASE
#-----------------------------------------------------
ReactionSite : {
	name => "K",
	type => "csite",
}
Structure: {name => K, elements => [K]}

#-----------------------------------------------------
# RULES
#-----------------------------------------------------
CanBindRule : {
	ligand_names => ['K', 'T'],
	ligand_allosteric_labels => ['.', 'R'],
	ligand_msite_states => ['.', '0'],
	kf => kf_RK,
	kb => kb_RK,
	kp => kp_RK,
}

CanBindRule : {
	ligand_names => ['K', 'T'],
	ligand_allosteric_labels => ['.', 'S'],
	ligand_msite_states => ['.', '0'],
	kf => kf_SK,
	kb => kb_SK,
	kp => kp_SK,
}
Effect of a Conformational Change

An allosteric site can be a modifier for another allosteric site. In this case, in a pair of coupled allosteric sites, each site reciprocally and equally affects the other's allosteric equilibrium constant. For example, in the model of a heterodimer shown below (where we have not included any ligand-binding sites for simplicity's sake), each allosteric subunit acts as a modifier of the other's allosteric transition, with a regulatory factor given by Γab. While the regulatory factor must have the same value in both directions, the Φ-values may be different. Thus, the fold-change in the allosteric equilibrium is distributed over the kinetic rates according to each indicated Φ-value, and the allosteric rates for each subunit when the coupled subunit is in its non-reference state (i.e. the S conformation) are calculated as follows:

{k^{\alpha}_{RS}}' = k^{\alpha}_{RS}(\Gamma_{ab})^{\Phi_{ab}}\,

{k^{\alpha}_{SR}}' = k^{\alpha}_{SR}(\Gamma_{ab})^{(\Phi_{ab}-1)}\,

{k^{\beta}_{RS}}' = k^{\beta}_{RS}(\Gamma_{ab})^{\Phi_{ba}}\,

{k^{\beta}_{SR}}' = k^{\beta}_{SR}(\Gamma_{ab})^{(\Phi_{ba}-1)}\,

ANCManual ASiteEffect.png

The textual form of the model given below shows how to create the allosteric coupling between the α and β subunits using add_allosteric_couplings (which is not an attribute of the object but rather an argument to the structure object's constructor). The add_allosteric_couplings argument should be a list comprising the index (or address) of each coupled element, the regulatory factor giving the strength of the allosteric coupling, and the Φ-values in each direction. The index of each component is given by its position in the elements structure attribute. If the coupled component is at a lower level of hierarchy, then the address is given as a list of indices (c.f. the component addressing section of the reference manual).

File: $ANC_HOME/examples/manual/asite_effect.mod

#-----------------------------------------------------
# HETERODIMER
#-----------------------------------------------------
AllostericStructure: {
	name => ALPHA,
	elements => [],
	allosteric_transition_rates => [k_RS_alpha, k_SR_alpha],
	allosteric_state_labels => ['R','S'],
}
AllostericStructure: {
	name => BETA,
	elements => [],
	allosteric_transition_rates => [k_RS_beta, k_SR_beta],
	allosteric_state_labels => ['R','S'],
}
Structure: {
	name => H,
	elements => [ALPHA, BETA],
	add_allosteric_couplings => [
		[0, 1, Gamma_ab, Phi_ab, Phi_ba],
	],
}

It is useful to note that when Γab > 1, there is positive cooperativity for the switching of each subunit to the non-reference state. Conversely, when Γab < 1, there is negative cooperativity, such that the switching of one subunit to the S state stabilizes the R state of the other subunit. Finally, if Γab = 1, then the allosteric coupling is nil and there is no cooperativity.

Multiple Modifiers

When multiple modifiers are coupled to an allosteric site, ANC combines their effect by computing the product of the fold-changes induced by each modifier on the allosteric equilibrium and kinetic rates. Thus, each modifier has an independent effect on the allosteric site, i.e. there are no higher-order, synergistic effects on the conformational change rate constants.

To show how this works, we combine the above examples to create a model of a hypothetical protein with a subunit α regulated by a ligand, a modification site, and an adjacent subunit β also regulated by a ligand. The graphical form of the model is shown below, where we have omitted any rules.

ANCManual CombinedEffect.png

The allosteric equilibrium and kinetic rates of the α subunit depends on ligation state of the binding site AX, the modification state of the modification site T, and on the conformational state of the β subunit. Assuming there is only one ligand binding to AX, namely X, this results in 8 possible sets of allosteric parameters, as summarized in this table:


Allosteric equilibria and transition rates of α subunit
Ligation state Modification state β's conformational state Allosteric Equilibrium Constant R->S transition rate S->R transition rate
unbound unmodified (0) reference state (R) K_{RS}\, k_{RS}\, k_{SR}\,
bound to X unmodified (0) reference state (R) K_{RS} \Gamma_{X}\, kf_{RS} (\Gamma_{X})^{\Phi_{AX}}\, kb_{RS} (\Gamma_{X})^{(\Phi_{AX}-1)}\,
unbound modified (1) reference state (R) K_{RS} \Gamma_{T}\, kf_{RS} (\Gamma_{T})^{\Phi_T}\, kb_{RS} (\Gamma_{T})^{(\Phi_T-1)}\,
unbound unmodified (0) non-reference state (S) K_{RS} \Gamma_{ab}\, kf_{RS} (\Gamma_{ab})^{\Phi_{ab}}\, kb_{RS} (\Gamma_{ab})^{(\Phi_{ab}-1)}\,
bound to L modified (1) reference state (R) K_{RS} \Gamma_{X} \Gamma_{T}\, kf_{RS} (\Gamma_{X})^{\Phi_{AX}} (\Gamma_{T})^{\Phi_T}\, kb_{RS} (\Gamma_{X})^{(\Phi_{AX}-1)} (\Gamma_{T})^{(\Phi_T-1)}\,
unbound modified (1) non-reference state (S) K_{RS} \Gamma_{T} \Gamma_{ab}\, kf_{RS} (\Gamma_{T})^{\Phi_{T}} (\Gamma_{ab})^{\Phi_{ab}}\, kb_{RS} (\Gamma_{T})^{(\Phi_{T}-1)} (\Gamma_{ab})^{(\Phi_{ab}-1)}\,
bound to L unmodified non-reference state (S) K_{RS} \Gamma_{X} \Gamma_{ab}\, kf_{RS} (\Gamma_{X})^{\Phi_{AX}} (\Gamma_{ab})^{\Phi_{ab}}\, kb_{RS} (\Gamma_{LB})^{(\Phi_{LB}-1)} (\Gamma_{ab})^{(\Phi_{ab}-1)}\,
bound to L modified (1) non-reference state (S) K_{RS} \Gamma_{X} \Gamma_{T} \Gamma_{ab}\, kf_{RS} (\Gamma_{X})^{\Phi_{AX}} (\Gamma_{T})^{\Phi_T} (\Gamma_{ab})^{\Phi_{ab}}\, kb_{RS} (\Gamma_{X})^{(\Phi_{AX}-1)} (\Gamma_{T})^{(\Phi_T-1)} (\Gamma_{ab})^{(\Phi_{ab}-1)}\,

Ad Hoc Regulation

Rather than conformational transitions to implement an allosteric regulation between two interaction sites, a model can specify ad hoc constraints that encode a regulatory relationship within a CanBindRule. Association and/or dissociation constraints are given via the attributes association_contraints, dissociation_constraints, and constraints. In the following example (see $ANC_HOME/examples/MAPK/MAPK_ste5_adhoc.mod), a constraint specifies that the MAPKK csite KN2 may only bind its substrate when its regulatory site (which is an adjacent msite immediately to the left in the structure) is phosphorylated. The constraints are given as a list in which the pre-defined variables $L and $R may be used to refer to the ligands.

# KN2 (MAPKK) phosphorylates M3 (MAPK) if M2 is phosphorylated
CanBindRule : {
	ligand_names => ['KN2', 'M3'], 
	ligand_msite_states => ['.', 0],
	kf => 0.5e-3,
	kb => 0.5,
	kp => 1.0,
	steric_factor => 1e6,
	association_constraints => [
		'$L->get_left_node()->get_msite_state() == 1',
	],
}

A second example implementing ad-hoc constraints is in $ANC_HOME/examples/multimer/multimer_TTS_gem.mod, which uses ad hoc constraints to ensure that a ligand can only bind to a sub-unit it has already docked to.

More details on how to use constraints will be given in this section soon, including a list of functions with which to implement constraints. In the meantime, please browse through the examples or contact the author for help.

Initial Conditions

When ANC compiles the biochemical reaction network implied by a set of structures and rules, it does not take into account the initial concentration of the species in the network. By default, all compiled species have an initial concentration of zero in output biochemical equation file. To give a species a non-zero initial concentration, use the Init object.

Often, it is sufficient to specify the initial concentration of a structure in its reference state (all msites in state 0, and allosteric nodes in the R state). To do so, one need only give a structure name and the IC as shown in the following example (from $ANC_HOME/examples/MAPK/MAPK_ste5_adhoc.mod):

Init : {
	structure => KK,
	IC => 1,
}

Another state than the reference state can also be specified. Using the state attribute, the user gives a comma-separated, hierarchical list of each node's state, including group nodes. In the following example from $ANC_HOME/examples/adaptor/adaptor_generic_v2.mod, adaptor's allosteric node has state S, while the binding sites it comprises have state 'x' (i.e. don't care) because they don't have a state.

Init : {
	structure => 'A',
	state => '[S,x,x]',     # non-reference state
	IC => 1.0,
}

For nested structures, each nested structure's state is given as another list. See $ANC_HOME/examples/multimers/multimer_KNF_tetra.mod for an example.

Network Input and Stimulus Objects

While a network's initial conditions are one form of input to the network, in many situations it is desirable to apply a dynamic input to the network during simulation. This is done via Stimulus objects, which specify that a certain stimulus pattern is to be applied to a designated node.

Stimulus objects define extra biochemical equations in ANC's output whose effect is to perturb the network via the addition of dynamic sources and sinks for the specified "input" species. Used in combination, these sources and sinks may implement different types stimuli, namely source, sink, clamp, impulse, wash, staircase, ramp and ss_ramp. For example, the stimulus object below clamps the concentration of structure G (the instance in the unligated, reference state) to 4.0 over the time interval from 500 to 1000s. It does this by generating two biochemical equations in the compiled network. The first is a sink (or degradation) reaction for the species G (G -> null). The degradation rate is given by the strength attribute, and controls how quickly the clamp reaches equilibrium and has inverse time units. The second biochemical equation is a source (null -> G) whose rate is is 4x the degradation rate, thus fixing the steady-state concentration of G to 4.0. Both the source and sink equations are only active during the specified time interval.

Stimulus : {
	structure => 'G',
	type => "clamp",
	length => 1000,
	delay => 500,
	strength => 100,
	concentration => 4.0,
}

Please refer to the reference manual for more detailed information on each type of stimulus.

Network Output and Probe Objects

On the output site, interpreting simulation results can be a challenge if the compiled system contains dozens or hundreds of species. Thankfully, ANC provides a means to specify a collection of species which has some relevant physiological meaning. For example, it might be desired to know the total concentration of a bi-phosphorylated protein independently of its liganded or allosteric state.

A Probe object allows the user to define a collection of species which will get assigned to a probe variable in the exported Facile biochemical equations file. The idea is to start with all objects of a certain class, then apply any number of filtering criteria to get the final collection. The filtering is done using Perl's grep LIST function, which puts each element of a given list (in our case, the starting collection) in the variable $_. Hence, we write our filtering criteria by expression them as boolean-value returning functions of $_.

A number of functions are useful in defining probes. These are:

  • get_num_elements(), the number of elements in an object
  • get_name(), the internal name of an object
  • get_exported_name(), the name of the object as exported to Facile
  • get_msite_state(), the msite state of a ReactionSite
  • get_allosteric_state(), the allosteric state of an AllostericStructureInstance
  • etc....

The following examples (taken from $ANC_HOME/examples/adaptor/adaptor_generic.mod) illustrate the use of Probes. The idea is to start with a collection consisting of all ComplexInstances, and then filter that collection based on the criteria of the number of biomolecules in the complex, and on the name of the complex. The last two Probes below use the structure parameter to pick up a named simple species in its reference state.

Probe : {
	name => "TRIMER",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 3',
        ],
}

Probe : {
	name => "AX_DIMER",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 2',
 		'$_->get_exported_name() =~ /A.*X/',
        ],
}

Probe : {
	name => "AY_DIMER",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 2',
 		'$_->get_exported_name() =~ /A.*Y/',
        ],
}

Probe : {
	name => "A",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 1',
 		'$_->get_exported_name() =~ /A/',
        ],
}

Probe : {
        name => "RESPONSE",
        classes => ComplexInstance,
        filters => [
                '$_->get_exported_name() =~ /A.*Y/',
        ],
}

Probe : {
        structure => X,
}
Probe : {
        structure => Y,
}


The following examples (taken from the $ANC_HOME/examples/multimers/multimer_MWC.mod) start with the collection of all AllostericStructureInstances. The collections are then filtered to get all the instances called "H", and finally all the instances in the R (or T) state.

Probe : {
        name => "H_R",
        classes => AllostericStructureInstance,
        filters => [
                '$_->get_name() =~ "H"',
                '$_->get_allosteric_label() eq "R"',
        ],
}

Probe : {
        name => "H_T",
        classes => AllostericStructureInstance,
        filters => [
                '$_->get_name() =~ "H"',
                '$_->get_allosteric_label() eq "T"',
        ],
}

Probe : {
        structure => L,
}

It is also possible to define Probes which filter based on connectivity. Examples to come....


Parameterizing a Model

Instead of providing a numerical value to a rule attribute, you can specify the name of a Parameter object. This allows you to parametrize your model, as shown in the example below.

Parameter : {
   name => "kf_XY",
   value => "1",
}

CanBindRule : {
        ligand_names => ['X', 'Y'], 
        kf => kf_K,   # parameterized rate
        kb => 0.1,    # numerical value
}

Cutoff Rates

Use of this feature is deprecated.

You may specify a threshold whereby ANC does not generate a reaction if a reaction rate constant is below the threshold. This can sometimes help shorten compilation time by filtering out reactions that are relatively slow. Refer to information about rate cutoff variables in the reference manual for more details.

Note that at present, parameters cannot be used in combination with cutoff rates (see previous section). So, if one of the model's rates is parametrized, ANC will not try to determine if the parametrized rate is smaller than the cutoff.

Example Model and File

An ANC model file is a text file which is divided into several sections: a MODEL section where nodes, structures and rules are defined, an INIT section giving initial conditions in a simplified form, a CONFIG section, and others (for more details see the reference manual below). A model file typically begins with a MODEL section, which comprises all the object-oriented constructs used to model a biochemical system with ANC.

To illustrate what a complete ANC model file looks like, we give below a model for a generic allosteric receptor which dimerizes and activates upon binding a ligand (like the EGF receptor). The receptor has two conformations, inactive (I) and active (A), which are in thermodynamic equilibrium. When the receptor is unliganded and in monomer form, the inactive conformation is heavily favoured. Binding the ligand L causes the active form of the monomer to be favoured, and this active form has high affinity for other receptor monomers. One can think of this as a receptor monomer cooperatively binding a ligand a a second monomer.

$ANC_HOME/examples/Rdimer/Rdimer_mwc.mod

# Generic ligand-induced receptor dimerization, with MWC-type model.
# Based on the model for EGFR receptor dimerization of Ozcan et al.
# Dimerization occurs at different rates depending on the receptor
# monomer conformation. Dimerization occurs cooperatively with ligand
# binding.  Parameter values are illustrative and do not correspond
# to an experimental situation.
#
# References: On the nature of low- and high-affinity EGF receptors
# on living cells. Ozcan F, Klein P, Lemmon MA, Lax I, Schlessinger J.
# Proc Natl Acad Sci U S A. 2006 Apr 11;103(15):5735-40.
# Epub 2006 Mar 29. 
#

######################################################
MODEL:
######################################################

#-----------------------------------------------------
# Compile Parameters
#-----------------------------------------------------
$max_species = -1;

#-----------------------------------------------------
# Model Parameters
#-----------------------------------------------------
# ligand-receptor binding
Parameter : {name => 'kf_LI', value => 5.0}
Parameter : {name => 'kb_LI', value => 1.0}
Parameter : {name => 'kf_LA', value => 500.0}
Parameter : {name => 'kb_LA', value => 1.0}

# receptor allostery
Parameter : {name => 'k_IA', value => 1.0}
Parameter : {name => 'k_AI', value => 100.0}
Parameter : {name => 'Phi_IA', value => 0.5}

# receptor dimerization
Parameter : {name => 'kf_II', value => 1.0}
Parameter : {name => 'kb_II', value => 100.0}
Parameter : {name => 'kf_IA', value => 1.0}
Parameter : {name => 'kb_IA', value => 1.0}
Parameter : {name => 'kf_AA', value => 100.0}
Parameter : {name => 'kb_AA', value => 1.0}

#-----------------------------------------------------
# Receptor
#-----------------------------------------------------
ReactionSite : {
  name => "LB",   # ligand-binding site
  type => "bsite",
}

ReactionSite : {
  name => "D",  # dimerization site
  type => "bsite",
}

AllostericStructure : {
  name => "R",
  elements => ["LB", "D"],
  allosteric_transition_rates => [k_IA, k_AI],
  allosteric_state_labels => ['I', 'A'], # inactive, active 
  Phi => Phi_IA,
}

#-----------------------------------------------------
# Ligand
#-----------------------------------------------------
ReactionSite : {
  name => "L",
  type => "bsite",
}

Structure : {
  name => "L",
  elements => ["L"],
}

#-----------------------------------------------------
# Rules
#-----------------------------------------------------

# ligand-receptor binding
CanBindRule : {
  ligand_names => ['L', 'LB'], 
  ligand_allosteric_labels => ['.', 'I'],
  kf => kf_LI, 
  kb => kb_LI,
}
CanBindRule : {
  ligand_names => ['L', 'LB'], 
  ligand_allosteric_labels => ['.', 'A'],
  kf => kf_LA, 
  kb => kb_LA,
}

# receptor dimerization
CanBindRule : {
  ligand_names => ['D', 'D'],
  ligand_allosteric_labels => ['I', 'I'],
  kf => kf_II,
  kb => kb_II,
}
CanBindRule : {
  ligand_names => ['D', 'D'],
  ligand_allosteric_labels => ['I', 'A'],
  kf => kf_IA,
  kb => kb_IA,
}
CanBindRule : {
  ligand_names => ['D', 'D'],
  ligand_allosteric_labels => ['A', 'A'],
  kf => kf_AA,
  kb => kb_AA,
}

#-----------------------------------------------------
# Init
#-----------------------------------------------------
Init : {
	structure => R,
	IC => 1.0,
}

#-----------------------------------------------------
# Stimulus
#-----------------------------------------------------
Stimulus : {
	structure => 'L',
	type => "clamp",
	length => 10000,
	delay => 5000,
	strength => 1000,
	concentration => "0.04*(t-5000)/1000",
}

#-----------------------------------------------------
# Probes
#-----------------------------------------------------

Probe : {
	name => "MONOMER_L0",
	classes => ComplexInstance,
	filters => [
 		'$_->get_exported_name() =~ /R/',
 		'$_->get_num_elements() == 1',
        ],
}

Probe : {
	name => "MONOMER_L1",
	classes => ComplexInstance,
	filters => [
 		'$_->get_exported_name() =~ /L.*R/',
 		'$_->get_num_elements() == 2',
        ],
}

Probe : {
	name => "DIMER_L0",
	classes => ComplexInstance,
	filters => [
 		'$_->get_exported_name() =~ /R.*R/',
 		'$_->get_num_elements() == 2',
        ],
}

Probe : {
	name => "DIMER_L1",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 3',
        ],
}

Probe : {
	name => "DIMER_L2",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 4',
        ],
}

Probe : {
	name => "DIMER_ACTIVE",
	classes => ComplexInstance,
	filters => [
 		'$_->get_exported_name() =~ "RA.*RA"',
        ],
}

Probe : {
	structure => L,
}

######################################################
CONFIG:
######################################################

t_vector = [t0:0.1:tf]

Model Compilation

Given a model, ANC compiles the set of biochemical reactions that are implied by the model's structures, which describe biomolecules, and the model's rules, which describe interactions between biomolecular reaction sites. This section aims to give the user an overview of this process.

Compilation of a model consists of ANC's invocation, processing of command-line parameters, reading the input model file, creation and initialization of seed species, compilation of the biochemical reactions implied by the model using an iterative algorithm, processing initial conditions, stimuli and probes, writing the Facile output file, and optionally generating reports and structure diagrams for the generated species. We examine each of these steps in turn.

Invocation and Command-Line Options

ANC is invoked by calling the the anc.pl script from the command line. The format is:

bash% ./and.pl [options] model

Various useful command-line options can be obtained with the following command:

bash% ./anc.pl --help

Initialization

ANC's first step is to identify all top-level structures in the model file -- those structures which are not a component of any other structure. These top-level structures are assumed to describe the biomolecules (proteins, ligands, etc.) in the system. Because it is simpler to treat complexes and biomolecules in a unified way during compilation, top-lvl structures are imported into the Complex class. During this importation, each top-level structure gives rise to a seed complex containing a single structure.

Next, each seed complex is instantiated. This yields a structure instance for each complex -- i.e. a structure having a specified state. Seed complex instances, by default, have any msite reaction sites initialized to a state of '0', and any allosteric sites initialised to the reference state, 'R'. Other initialization states may be also specified via Init objects.

Reaction Network Compilation

ANC compiles the biochemical reaction network implied by a set of structures and rules by repeatedly applying reaction templates to the system through an iterative, multi-pass process. In the first iteration, ANC begins with the seed structure instances, and considers each node and pair of nodes within this set as putative reactants. If one of the reaction templates applies, and (when required) an enabling rule is found for the specific node(s) involved, then ANC computes the product's structure and generates a biochemical reaction according to the specifications of the template. If the product did not previously exist, then it is included in the next iteration as a putative reactant. Thus, at the end of the first iteration, ANC should have generated all possible dimers (some of which may be catalytic complexes), and catalytic reaction products. In the second iteration, ANC goes through same set of steps using the original species plus all the species created in the first iteration. This can result in the creation of trimers, and tetramers. In the third iteration, the process repeats again and pentamers, hexamers, septamers and octamers may result. Clearly, for some systems this iterative process may never end unless the user imposes a terminating condition (more on this below).

Each iteration is divided into two phases. The first phase compiles external reactions, i.e. those reactions that occur between sites on distinct biomolecules or complexes. Such inter-complex reactions are 2nd order reactions because they involve two reactants. In the second "internal" phase of each iteration, and subject to finding an enabling rule in the model, ANC applies binding and catalytic reaction templates to pairs of reaction sites found on the same complex, and unconditionally applies the allosteric transition template to allosteric nodes in the complex. These intra-complex reactions involve reactions are 1st order reactions since they involve only one reactant.

Stopping Conditions

Given a model, by default ANC will keep generating new reactions and complexes until there are no more to be found. In some cases, the iterative compilation process may never end. An example of this is a simple protein capable of polymerization. Clearly, we need a means of limiting the number of species generated by ANC for such cases.

One way is to use compilation parameters such as the $max_species parameter, which causes the compilation to be interrupted when the number of species compiled has reached the specified limit. It is also possible to specify certain a stopping condition using protein attributes such as max_count, which limits the number duplicates of a specific protein that may be contained in a complex. Finally, other compilation parameters limit the number of iterations in the compilation process. These are the $max_external_iterations and $max_internal_iterations parameters. Refer to the Reference Manual for details.

Compilation of ICs, Stimuli and Probes

It is only after the reaction network has been compiled that all species in the reaction network are known. For this reason, compilation of initial conditions, stimuli, and probes follows network compilation.

Exporting Compilation Results

Once compilation is finished, ANC exports the following files.

Facile Equation File

The compiled set of biochemical equations, as well as initial conditions, stimuli and probes are exported into a form suitable to be read by the Facile biochemical network compiler. This Facile equation file lists the biochemical equations, rate constants, and simulation options of the network in a human-readable textual form. Facile can convert this file into scripts suitable for simulation by tools such as Matlab, XPP, and EasyStoch.

The name of the equation file, by default, is the same as that of the ANC model file, but with the extension changes to eqn. Another name can be specified via ANC's --out option.

Reports

ANC can generate a report listing all species in the compiled reaction network, and also a detailed report giving the structure of each species in the network. This is enables either through the --report command-line option, or by setting $report variable in the model file.

To generate these reports through the command-line:

./anc.pl --report="species,structure" mymodel.mod

Structure Diagrams

Through the the graphviz option or equivalently by assigning a value to the $export_graphviz variable in your model (the value of the command-line option has precedence), ANC can be directed to generate image files showing all the structures and structure instances in the biochemical network. The graphviz option (or the $export_graphviz variable) is assigned a comma-separated list of keywords which directs ANC to generate various diagrams.

To generate these diagrams of all structures generated through the command-line, use the canonical keyword:

./anc.pl --graphviz='canonical' mymodel.mod

Network Diagram

Also using the graphviz option, the network keyword directs ANC to generate an image file showing the reaction network as a graph. In this network, each node in the network corresponds to a complex instance.

By default, the network is presented in full detail with each state and complex mapped to a distinct node. Optionally, through the collapse_states keyword, the network can be collapsed by projecting different states of the same complex into a single node. The network can also be collapsed (orthogonally) such that the nodes are proteins instead of complexes, and interactions between complexes are re-drawn to occur between the corresponding proteins. For a traditional protein-protein interaction network, both projections should be enabled.

Example showing how to generate an (almost) traditional protein-protein interaction graph. Structure diagrams are also generated through the canonical keyword:

./anc.pl --graphviz='canonical,network,collapse_states,collapse_complexes' mymodel.mod

Refer to the reference manual for more information about using $export_graphviz.

Interpreting Results

In this section, we will try to gain an understanding of the output files produced by ANC and interpret the compilation results shown in the exported biochemical equation file. We shall work with the model contained in $ANC_HOME/examples/Rdimer/Rdimer_mwc.mod, already listed at the end of the in System Modelling section. A graphical version of this model is shown below:

ANCManual Rdimer.png

To run ANC on this model, type the following commands:

mkdir temp
cp $ANC_HOME/examples/Rdimer/Rdimer_mwc.mod .
$ANC_HOME/anc.pl --clean --report=all graphviz='canonical,network' Rdimer_mwc.mod

Biochemical Equation File

The most important file generated is the biochemical equation file, which is in a format suitable for the Facile compiler, which generates simulation scripts for Matlab and other tools given a list of biochemical equations. By default, the equation file is called Rdimer.eqn and is written to the same directory as the model file. Let's examine the equation file in detail.

Header

The first few lines of the file contain a header indicating which version of ANC generated it, and a timestamp:

# Facile model created by Allosteric Network Compiler (ANC)
# ANC version 0.23 released 2010/02/19
# Fri Feb 26 15:03:07 EST 2010

Parameters

Next is the section which assigns numerical values to all the parameters defined by the user:

# PARAMETERS
# ----------
parameter kf_LI = 5
parameter kb_LI = 1
...

Catalytic Reactions

If any exist, the next section lists all catalytic reactions. When any are present, the section begins with a simple header and lists all catalytic reactions in the network:

# REACTION CLASS: CatalyticReaction
# ---------------------------------
(list of catalytic reactions)...

Our receptor dimerization example has no catalytic reactions. However, the format in which these reactions are given is essentially the same as for binding reactions, which we discuss next.

Binding Reactions

The next section lists all binding reactions. ANC does not list reactions in the order that they were generated, but tries instead to arrange and group the reactions in an order which makes them more readable. Here is a subset of these reactions:

# REACTION CLASS: BindingReaction
# -------------------------------
L    + RA         <-> L_RAi00         ; fb00=kf_LA; bb00=kb_LA         # (R!02) Kd = kb_LA/kf_LA
L    + RI         <-> L_RIi00         ; fb01=kf_LI; bb01=kb_LI         # (R!01) Kd = kb_LI/kf_LI
RA   + RA         <-> RA_RAi00        ; fb02=kf_AA / 2; bb02=kb_AA     # (R!05) Kd = kb_AA/kf_AA / 2
RA   + RI         <-> RI_RAi00        ; fb03=kf_IA; bb03=kb_IA         # (R!04) Kd = kb_IA/kf_IA
RI   + RI         <-> RI_RIi00        ; fb04=kf_II / 2; bb04=kb_II     # (R!03) Kd = kb_II/kf_II / 2
L    + RA_RAi00   <-> L_RA_RAi00      ; fb00=kf_LA; bb00=kb_LA         # (R!15) Kd = kb_LA/kf_LA
L    + RA_RAi00    -> L_RA_RAi00      ; fb00=kf_LA                     # (R!16) Kd = UNDEFINED
L    + RI_RAi00   <-> L_RA_RIi00      ; fb00=kf_LA; bb00=kb_LA         # (R!14) Kd = kb_LA/kf_LA
L    + RI_RAi00   <-> L_RI_RAi00      ; fb01=kf_LI; bb01=kb_LI         # (R!13) Kd = kb_LI/kf_LI
...
L    + L_RA_RAi00 <-> L_L_RA_RAi00    ; fb00=kf_LA; bb00=kb_LA         # (R!35) Kd = kb_LA/kf_LA
L    + L_RA_RAi00 <-  L_L_RA_RAi00    ; bb00=kb_LA                     # (R!36) Kd = UNDEFINED
...

The reactions are listed in an easy-to-read format. Each line gives a reaction's substrates, products, then forward and reverse reaction rate constants separated by a semicolon. Following this is a comment giving the reaction number (each reaction is numbered according to the order in which it was generated) and if applicable the dissociation constant of the reaction. [Note: as you may observe, the Kd is not calculated correctly for degenerate reactions, and also ANC does not correctly parenthesize the Kd for R!05. We hope to fix this eventually.]

Let's examine each of the above reactions in more detail.

The first listed reaction (R!02) is the hetero-dimerization of the ligand (L) and the receptor monomer in its active state (RA), producing the species (or complex instance) L_RAi00. Similarly, the following reaction (R!01) is the hetero-dimerization of L with the inactive conformation of the receptor (RI). Notice that the association and dissociation rates are distinct to each state of the receptor. The ligand-receptor complexes are named L_RAi00 and L_RIi00. ANC generates a name for complexes by concatenating the name (and state, when appropriate) of each biomolecule in the complex. A "i00" suffix is appended because some complexes can comprise exactly the same set of biomolecules but nevertheless be chemically distinct if the biomolecules are bound to each other in a different way. For example, suppose that R had two distinct binding sites for L instead of just one, this would then give rise two different (active) receptor-ligand complexes named L_RAi00 and L_RAi01. In a nutshell, ANC uses suffixes of the form "iXX" to distinguish structural isomers which are generated during compilation.

The next three reactions, namely R!03, R!04 and R!05 are the reactions for the dimerization of the receptor in its inactive form (R!03), its active form (R!05), and for the dimerization of the inactive and active forms of the receptor (R!04). Notice that the association rate constant for the dimerization of two receptors in the same state is half of the value given by the corresponding rule. This is because a CanBindRule specifies the association rate between two binding sites without regard to which structures comprise the sites. Therefore ANC assumes that the association rate kf given in a CanBindRule characterizes a binding reaction between two distinct biomolecules. If it so happens that the two ligands are identical, ANC must divide the association rate given in the rule by two.

To see why, imagine than a receptor R homodimerizes with with a rate constant f and heterodimerizes with a phosphorylated form of itself (denoted Rp) with rate constant fp. Assuming the phosphorylation merely tags the receptor and so does not change the reaction kinetics, then the total dimerization reaction velocity given a certain quantity of receptor should be the same no matter the proportion of receptor that is phosphorylated. Assuming we have 1 mol of R, then by the law of mass action the reaction velocity is v=f*1*1. Likewise if we have 1 mol of Rp, then v=f*1*1. If we have 0.5 mol each of R and Rp, then the reaction velocity is given by v=2*f*0.5*0.5 + fp*0.5*0.5. Since the total dimerization rate is the same in each case, then f=0.5f+0.25fp and so f=0.5fp.

The following two reactions, R!015 and R!016, are the binding of L to a homodimer of the receptor. Here, we observe that we have not one but two association reactions, but only one of reverse reaction. This is an example of association reaction degeneracy. Indeed, because the receptor homodimer is symmetric, there are two indistinguishable sites to which the ligand can bind, giving two identical association reactions having the same reactants and the same product either way. In contrast, the dissociation reaction is not degenerate, so there is only one.

In next two reactions however (R!13 and R!14), the two receptors in the dimer are in a different conformation, so they are distinguishable. In this case ligand binding gives rise to two reversible reactions, as expected.

The last two reactions shown (R!35 and R!36) are an example of dissociation reaction degeneracy. Here, the ligand is binding to a receptor dimer where both receptors are active and where one of the receptors has already bound one ligand. In this case the bound ligand makes the receptors distinguishable, so there is only one association reaction. However, there are two identical dissociation reactions because once bound either ligand can dissociate to yield the mono-ligated dimer.

Allosteric Reactions

The next section lists the allosteric transitions in the reaction network, if any exist. For the purposes of discussion, we have excerpted three of these. Also, in the equation file each reaction is on a single line but here we have split them up for legibility:

# REACTION CLASS: AllostericReaction
# ----------------------------------
RI        <-> RA        ; fu00=k_IA; bu00=k_AI           # (R!00) Keq = k_IA / k_AI
L_RIi00   <-> L_RAi00   ; fu01=k_IA * (((kf_LA / kb_LA) / (kf_LI / kb_LI)) ^ Phi_IA)
                        ; bu01=k_AI * (((kf_LA / kb_LA) / (kf_LI / kb_LI)) ^ (Phi_IA - 1))
                        # (R!06) Keq = (k_IA * (((kf_LA / kb_LA) / (kf_LI / kb_LI)) ^ Phi_IA)) /
                                       (k_AI * (((kf_LA / kb_LA) / (kf_LI / kb_LI)) ^ (Phi_IA - 1)))
...
RI_RIi00  <-> RI_RAi00  ; fu04=k_IA * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ Phi_IA)
                        ; bu04=k_AI * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ (Phi_IA - 1))
                        # (R!07) Keq = (k_IA * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ Phi_IA)) /
                                       (k_AI * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ (Phi_IA - 1)))
RI_RIi00   -> RI_RAi00  ; fu04=k_IA * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ Phi_IA)
                        # (R!08) Keq = (k_IA * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ Phi_IA)) /
                                       (k_AI * (((kf_IA / kb_IA) / (kf_II / kb_II)) ^ (Phi_IA - 1)))
...

The first reaction (R!00) is the allosteric transition between inactive and active states of the unligated receptor monomer. The transition rates are those given when the allosteric structure was created.

The next two transitions (R!07 and R!08) are for the allosteric transition between an inactive receptor dimer and a partially active dimer. Note that the transition from the inactive state is degenerate because either of the monomers can change conformation, while in the reverse direction it is not degenerate because it is necessarily the active monomer that changes to the inactive state.

Reports

When asked (via the --report command-line switch or the $report variable), ANC can generate a report listing all species in the compiled reaction network, and also a detailed report giving the structure of each species in the network.

For our receptor dimer example, the species report Rdimer_mwc.species.rpt:

6                              15                            
L                              L                             
R                              RI RA                         
L_R_i00                        L_RIi00 L_RAi00               
R_R_i00                        RI_RIi00 RI_RAi00 RA_RAi00    
L_R_R_i00                      L_RI_RIi00 L_RI_RAi00 L_RA_RIi00 L_RA_RAi00
L_L_R_R_i00                    L_L_RI_RIi00 L_L_RI_RAi00 L_L_RA_RAi00

The first column of the report lists all structure templates in the network, that is to say the structures without state information. The second column lists all structure instances corresponding to each template. The first line gives a count of the total number of structure templates and instances.

The structure report Rdimer_mwc.structures.rpt lists the graph of each structure in the network in matrix form. This can sometimes be useful to debug a model. Here is an excerpt of the first couple structures:

Complex: L
 |     | g_L | L
 | g_L | -   | g
 | L   | -   | -

L
  L
    L

Complex: R
 |     | a_R | LB | D
 | a_R | -   | g,~{|Phi_IA} | g,~{|Phi_IA}
 | LB  | ~{|Phi_IA} | -  | -
 | D   | ~{|Phi_IA} | -  | -

R
  R
    LB
    D

GraphViz Diagrams

When we ran ANC on our receptor dimer example (see above), we passed the canonical and network keywords to the graphviz command-line option. This directs ANC to generate diagrams for the generated structures and for the reaction network. In this section, we learn how to interpret these diagrams.

Structure Diagrams

By default, ANC puts the generated structure template diagrams in a sub-directory called graphviz/canonical, and the instances in graphviz/canonical/instances. The diagrams are png images whose filename is based on the name of the corresponding complex. For example, the diagram for the complex containing only the ligand L is called L.png and looks like this:

L.png

There are a few minor differences between this diagram and the graphical notation we use in the manual, in part due to ANC's rather crude exporting ability. For example, the binding site L should have a horizontal bar but does not. Also, the hierarchical component of L has a "g_" prefix to indicate it is a "group node" (which is just another word for a hierarchical component). Internally, all structure edges in ANC are labelled to distinguish different types of edges. Thus, the 'g' beside the uni-directional edge stands for "grouping", i.e. containment.

Another example is the structure diagram for the receptor. Here, the hierarchical component of R is allosteric, but this is indicated by colouring it green instead of using a tilde '~'. The dashed line representing allosteric coupling are also coloured green. Furthermore, they are labelled with a string of the form { Γ | Φ }, which gives the regulatory factor and Φ-value associated with the allosteric coupling. In this particular example, the regulatory factor is not included in the label because it is unknown (the regulatory factor associated with a binding site depends on the specific ligand bound). The user can see that at this time, ANC is unable to obtain a layout of the graph were hierarchical nodes are shown at the top and interaction sites at the bottom (sorry!).

R.png

The two above example were diagrams of structure templates, i.e. without any state information. The structures also comprised a single biomolecule rather than being complexes. Next, we have an example of a structure comprising 4 biomolecules and in a specific state:

L L RI RAi00.png

This diagram shows the binding of the interaction sites of biomolecules through bi-directional edges labelled with a colon ':'. Also, the state of each component is appended to their names, in the form name:state, as for the state of the allosteric hierarchical nodes (states I and A). If a component has no state an 'x' is shown. Again, the layout is not ideal for a human, but all the necessary information is there.

Network Diagram

Here we show two example diagrams. The first is for our receptor dimer example. No collapsing of states or complexes occurred, so all 15 complexes listed in the species report are shown. The second example is for the model $ANC_HOME/examples/MAPK/MAPK_ste5_adhoc.mod, with both states and complexes collapsed. In both graphs, edges are associated with binding, catalytic, or allosteric reactions. The numbers labelling each edge's arrow-head and tail indicate which components are involved in the interaction (c.f. section on component addressing in reference manual).

Note: click on the images to get a larger and higher-resolution version.

Uncollapsed reaction network diagram for ligand-induced receptor dimerization. A green edge represents a binding interaction in which one or both of the sites is allosteric (otherwise the edge is black). In this case an asterix beside a head/tail number will indicate that the corresponding component is allosteric (hence there will be at least one). Finally, a double-headed green arrow represents the allosteric R<->T transition.
Fully collapsed reaction network diagram for MAPK example. The Black edges between nodes represent a binding interaction. A red arrow is drawn between enzymes and their product species (not the substrates!) if the product has an msite_state of 1 (i.e. a kinase). A blue arrow is drawn between enzymes and a product species with a state of 0 (i.e. from a phosphatase to its unphosphorylated product).


Example Models

This section will discuss in detail the implementation of the various example models included with ANC.

ANC User's Reference Manual

This section of the user manual fully documents the ANC model file format, relevant objects and object attributes, the compilation process and related parameters.

Model File Format

The ANC file format is very similar to the Facile file format. In fact, the main difference is that it has a MODEL section where the user defines objects which specify the basic species (proteins, etc.) of the biochemical network. Each section begins with the appropriate keyword.

MODEL Section

This section begins with the MODEL keyword and defines model constructs such as structures and rules in an object-oriented fashion. Objects of different classes (e.g. a Structure class, a CanBindRule class) are created, named, and their attributes are defined. The user may also define variables and parameters. The model section is executed as Perl code by ANC, and so can be as sophisticated as the user's knowledge of Perl.

N.b. When assigning textual values to some attributes, Perl quoting rules apply. In practice ANC is pretty flexible, you can use no quotes, doubles quotes or single quotes and it doesn't usually matter.

Object Creation

The syntax to create an object is shown below:

ObjectClass : {
  attribute1 => attribute1_value,
  attribute2 => attribute2_value,
  ....
  attributeN => attributeN_value,
  
  initializer1 => initializer1_value,
  initializer2 => initializer2_value,
}

Usually, most of the arguments passed to the object's constructor correspond directly to an object's attributes. Others arguments are used to initialize the object but don't map directly to an attribute.

Variable Creation

The syntax to create a variable or parameter is:

$parameter_name = parameter_value

Currently, the following variables and parameters have an effect:

Model Variables
Name Default Allowed values Description
Compilation Parameters
$max_external_iterations -1 -1,0,1... Maximum number of external iterations. The value -1 indicates that unlimited iterations are allowed.
$max_internal_iterations -1 -1,0 Maximum number of internal iterations. The result of using values other than -1 (compile all internal interactions) or 0 (no internal interactions) is difficult to predict. Therefore, it is recommended to use only -1 or 0 for this parameter.
$max_species -1 -1,0,1... Maximum number of species to generate. The value -1 indicates that no limit is placed on the number of species generated.
$default_steric_factor 0.0 non-neg real Default value for the steric_factor applied to internal reactions.
$export_graphviz undef A list of the following keywords: primary, ungrouped, scalar, canonical, network, collapse_states, collapse_complexes. Directs ANC to export the graph types specified by each keyword to the graphviz sub-directory of the output directory. The keywords primary, ungrouped, scalar and canonical cause structure graphs to be generated. Usually, you need use only the "canonical" keyworkd. The same functionality is accessed through the --graphviz command-line option, whose value overwrites any value specified in the model file.
$max_csite_bound_to_msite_number -1 -1,0,1... The maximum number of csite-msite bonds allowed in a complex. Complexes exceeding this will not be generated, nor will any downstream reactions and species. Note that depending on the rules defined in the model, not all csite-msite bonds are necessarily catalytically active. For example, if an inhibitor turns off a kinase even while it is complexed with its substrate, then a product reaction may not proceed. Hence, the number of catalytically active bonds is less than or equal to the number of csite-msite bonds.
$max_duplicate_count -1 -1,0,1... (NOT IMPLEMENTED). This was supposed to limit the number of biomolecules of a particular type in a complex. Useful to limit the size of the network in some situations.
$kf_1st_order_rate_cutoff 0 non-neg Cutoff 1st order (internal association) rate for which no reaction (or species) will be generated. The 1st order rate is generally the 2nd kf rate given in the relevant binding rule, multiplied by the associated steric_factor.
$kf_2nd_order_rate_cutoff 0 non-neg Cutoff 2nd order (distinct ligands) rate for which no reaction (or species) will be generated.
$kb_rate_cutoff 0 non-neg Cutoff (1st order) rate for which no reaction (or species) will be generated.
$kp_rate_cutoff 0 non-neg Cutoff (1st order) rate for which no reaction (or species) will be generated.
$compact_names 1 0,1 When set (the default), ANC strips out various underscore separator characters from exported structure names. Also, the bsite state placeholders ('x') embedded in the names are removed. For example, A_x1x_B_xx_i00 becomes A1_Bi00.
$protein_separator _ any char The character used to separate structures in exported complex names. If exporting to Mathematica, try using "i" in combination with compact_names. The above example is then exported as A1iBi00.
$sort_reactions_by_number 0 0,1 In the compiled equation file, sort the reactions by number instead of a more intelligent sort based on the reactants, products and type of the reaction.
$merge_degenerate_reactions (NOT IMPLEMENTED). The idea would be to merge degenerate reactions into one reaction while adjusting the reaction rate constants according to the multiplicity of the reaction.

Method Invocation

It is also possible to invoke methods on objects created in the model file. The syntax is:

ObjectClass:ObjectName->ObjectMethod(@ARGS)


Pass-Through Sections

The contents of several sections in the model file are passed through unchanged when a Facile model is exported from ANC. These are the EQN, INIT, PROBE, MOIETY and CONFIG sections. They have no effect on the ANC model or its compilation, though they may affect its simulation. See the Facile manual for more information on these sections.

Modelling Classes

We now describe the various classes of objects which may be defined in the MODEL section of the input file, along with those attributes which we expect the user will need to read or write. Some of the attributes are inherited from base classes (e.g. the 'Name' attribute). Also, many classes have more attributes which are not listed in this section because they are for ANC's internal use.

Node Classes

Node Class

Node Object Attributes
Name Allowed values Default Values Description
name Any undef The name must be unique within the class.
group_node_flag 0,1 0 Indicates whether this the hierarchical node of a structure.
reaction_type U,B X For internal use only. Used during compilation to determine whether the node participates in unary or binary reactions.
static_flag 0,1 0 When set to 1, this attribute prevents a reaction template from being applied to the node.


ReactionSite Class

This class models a reaction site on a biomolecule's surface which forms non-covalent bonds to a ligand, enzyme, or substrate.

ReactionSite Object Attributes
Name Allowed values Default Values Description
type bsite, csite, msite undef Determines whether the reaction site is a binding site, a catalytic site (enzyme), or a modification site (substrate).
Instance Attributes
msite_state 0,1 0 If 1, means that the msite was modified (e.g. phosphorylated). If the reaction site type is not 'msite', this field is always zero and has no effect.


AllostericSite Class

AllostericSite Object Attributes
Name Allowed values Default Values Description
allosteric_state_labels a single letter [R, T] Internally to ANC, the reference conformational state is R, while the non-reference state is T. This attribute allows the user to re-label the allosteric states. For instance, one might relabel R as I (inactive) and T as A (active) when modelling a receptor. Or, R as c (closed) and T and o (open) when modelling an ion-channel. These labels may be referenced by CanBindRules and will be used when exporting equations, species names etc.
allosteric_transition_rates non-neg real or INF [0, 0] These 2 parameters give the R->T and T->R transition rates of the AllostericSite in the reference context: coupled sites are unligated, unmodified if msites, and in the R state if AllostericSites.
Instance Attributes
allosteric_state R,T R Each site instance has an allosteric_state giving its conformation.


Structure Classes

Structure Class

Structure Object Attributes
Name Allowed Values Default Value Description
Template Attributes
name Any undef Any unique name.
elements [...] [] List of structure or node names that are contained in this structure. These must have been created previously.
max_count -1,0,1,2... -1 Maximum number times this structure can appear in a complex. Use -1 to indicate unlimited times. This attribute is inherited from the SetElement class.
ungroup_flag 0,1 1 Whether to ungroup the associated structure graph, which uncovers underlying symmetries in the structure. This attribute is inherited from the HiGraph class.
import_flag 0,1,undef undef This attribute helps determine which structures should be imported into the Complex class as 1-element complexes during initialization. For example, we generally want proteins to be imported but not domains. When reset (set), this attribute prevents(forces) the import and initialization of the Structure as a 1-element Complex object. If left undefined, the Structure will be imported only if it is not an element of another Structure.
add_allosteric_couplings [coupling, coupling, ...] [] A list of allosteric couplings between allosteric components in the structure. Each element of the list is itself a list of each component's address, a regulatory factor, and the Φ-values for each direction. If only one Φ-value is given it is used for both directions. I.e.

coupling=[address,address,reg_factor,phi,phi]


AllostericStructure Class

AllostericStructure Object Attributes
Name Allowed Values Default Value Description
Template Attributes
allosteric_flag 0,1 1 When 1 (the default), an AllostericSite will be used as the hierarchical node of the structure and the structure will therefore undergo conformational transitions. Set to 0 to make the structure behave the same as an ordinary structure (this can be useful when debugging a model).
allosteric_state_labels a single letter [R, T] This corresponds to the attribute of the same name for the AllostericSite created as the hierarchical node of the structure.
allosteric_transition_rates non-neg real or INF [0, 0] This corresponds to the attribute of the same name for the AllostericSite created as the hierarchical node of the structure.
reg_factors non-negative real 1.0 Elements of an AllostericStructure are automatically coupled to the allosteric site acting as the hierarchical node for the structure. For elements which are ReactionSites of type msite, reg_factor indicates the effect of the msite's state on the allosteric equilibrium between R and T states of the structure. If the state is 1, then the R<->T equilibrium is changed by a factor of reg_factor in favour of the T state. Similarly, for elements which are AllostericSites, the R<->T equilibrium is changed by a factor of reg_factor when the allosteric element is in the T state. A value should not be specified for binding or catalytic sites, as the regulatory factor depends on interaction affinities. In the case of ligand binding to an interaction site, the reg_factor is computed as the ratio of affinities of the ligand to the binding site in the T versus R states, i.e. (KeqT/KeqR). The attribute reg_factors can be given as a single value, in which case this value is used for all elements, or as a list of values corresponding to each element. In the former case, if some of the elements are csites or bsites a warning will be issued.
Phi Any 0.5 This parameter determines whether a change in the allosteric equilibrium due to a regulatory factor affects the R->T transition rate, the T->R transition rate, or both. The R->T transition rate changes by a factor of reg_factorPhi, while the T->R transition rate changes by a factor of reg_factorPhi-1. When Phi is 0, only the T->R rate is affected by a change in the allosteric equilibrium, and vice-versa if Phi is 1. At the default value of 0.5, the fold-change is distributed equally between the R->T and T->R transition rates. Values outside the [0,1] range including +-INF are allowed but aren't well-tested. The attribute Phi can be given as a single value, in which case this value is used for all elements, or as a list of values corresponding to each element.
Instance Attributes
allosteric_state R,T R This attribute is not a true attribute of the structure: it is aliased to the allosteric_state attribute of the structure's allosteric site.


Complex Class

A Complex comprises one or more biomolecules linked by biochemical bonds. These bonds may be inter-domain bonds within a particular protein, or inter-protein bonds, or both.

Objects of the Complex class are created as a result of the application of a reaction template on a reaction site, or by importing objects from the Structure class during initialization of the model. During the initialization process, ANC generates a set single-element Complexes for each of the structures defined in the model file (this "import" process can be prevented with the no_import_flag attribute of the protein object). Then, an instance of each Complex is created with all msite states initialized to 0, and allosteric states initialized to R, if applicable. This set of complex instances seeds the compilation of the model.

In some situations, it is useful to directly create and initialize a complex, such a a macromolecular assembly of structures. To do so, the modeller should recall that Complexes are a specialized type of Structure and therefore have all their attributes.

Here is the necessary information:

Complex Object Attributes
Name Allowed Values Default Value Description
Template Attributes
name Any undef Any unique name.
elements [protein, protein,...] Any set of protein names.
Instance Attributes
IC any non-negative real 0.0 If specified, gives the initial concentration of the species. Overrides anything given in the INIT section of the model file. This attribute is inherited from the Species class.


Rule Classes

CanBindRule Class

A CanBindRule object is used to define which reaction sites can bind. Any pair of reaction sites whose names match the string or regular expressions given in the rule will bind, providing that no additional constraints are specified in the rule. Any such pair is then taken to bind with the given kinetic rate constants.

If one reaction site is a csite and the other is an msite, a CanBindRule also implies that the substrate msite can be modified using the catalytic reaction template, assuming a non-zero value of kp rate is specified in the rule. Kinases and phosphatases (or whichever covalent modification the rule represents) are distinguished by specifying the appropriate ligand_msite_state for the substrate msite.

CanBindRule Object Attributes
Name Allowed Values Default Value Description
name Any auto-generated Any unique name. If not specified, an unique name for the rule will automatically be generated.
ligand_names [regexp, regexp] undef Specifies a pair of strings (or regular expressions) against which the names of all existing reaction sites will be matched. The order is not important. If a pair of reaction site names match, then the pair can undergo a binding reaction according to the binding reaction template. If it is a csite/msite pair, then the catalytic reaction template is applied.
kf, kb any non-negative real undef The rate-constants for the binding reaction. The kb constant is always a first-order reaction rate. The kf constant is the reaction rate constant of a 2nd-order heterodimerization reaction of two distinct structures containing the reaction sites.
kp non-negative real 0.0 If the rule matches a csite/msite pair of reaction sites, and kp has a non-zero value, then a catalytic product reaction may be generated and this attribute specifies the rate constant for product formation.
ligand_msite_states [regexp, regexp] ['.','.'] The binding template generates a reaction on the reaction sites specified in the rule only if the reaction sites have the specified state. The regexp can be '0', '1', or '.' if both states are allowed. Use '.' if the corresponding ligand is a bsite or a csite.
ligand_allosteric_labels [regexp, regexp] ['.','.'] The binding template generates a reaction on the reaction sites specified in the rule only if the reaction sites have the specified allosteric state labels. The regexp can be one of the labels (e.g. 'R' or 'T'), or '.' if both states are allowed. Use '.' if one of the reaction sites doesn't have an allosteric state. Note that reaction sites derive their allosteric state from allosteric components to which they are coupled. In some cases, they are coupled to more than one component.
steric_factor non-neg real $default_steric_factor This factor is applied to kf when the rule generates an internal binding reaction. It represents the effective concentration of the two ligands when co-located in the same signalling complex. The $default_steric_factor is 0.0 unless otherwise specified in the model file.
assocation_constraints A list of constraints [] A list of constraints which must be true for the rule to be applied in a specific context and generate an association reaction. Each expression in the list must evaluate to true. The special variables $L and $R are references to the reaction site instances involved in the reaction and can be used to construct constraints.
dissociation_constraints A list of constraints [] Same as for association_constraints, but applies to the dissociation reaction generated by a reversible binding rule.
constraints A list of constraints [] Same as above, but will apply to both association and dissociation reactions.

Here are some example rules:

# a simple rule specifiying that reaction sites X1 and Y1 can bind
# (assuming they have been defined previously in the model file)
# here we give a name to the rule, but this is not necessary
CanBindRule : {
        name => "simple",
        ligand_names => ['X1', 'Y1'],
        kf => 306.0,
        kb => 18.0,
}

# a rule specifying that the bsite X2 only binds the modified version of the msite Y2
CanBindRule : {
        ligand_names => ['X2', 'Y2'],
        ligand_msite_state => ['.', '1'],
        kf => 15.0,
        kb => 13.0,
}

# a rule specifying that the csite KN only binds and toggles the state of
# an unphosphorylated msite M (hence, it is a kinase)
CanBindRule : {
        ligand_names => ['KN', 'M'],
        ligand_msite_state => ['.', '0'],   ???
        kf => 6.0,
        kb => 18.0,
        kp => 100.0,
}

# a rule specifying that the csite PP only binds and toggles the state of
# phosphorylated msites (hence, it is a phosphatase).  this rule also
# demonstates the use of a regular expression to make the phosphatase
# non-specific:  it will de-phosphorylate any msite whose name begins
# with 'M'
CanBindRule : {
        ligand_names => ['PP', 'M.*'],
        ligand_msite_state => ['.', '1'],
        kf => 306.0,
        kb => 18.0,
        kp => 20.0,
}

Parameter Class

Instances of the Parameter class can be used instead of numerical values in Rules and Structures.

Parameter Object Attributes
Name Allowed values Default Values Description
name any string undef Name of the parameter.
value any string or number undef Value of the parameter, which may include expressions with other parameters.
is_numeric_flag 0 or 1 0 or 1 ANC sets this read-only attribute to 1 if the parameter is a numerical value.


Here is an example, excerpted from $ANC_HOME/examples/calmodulin/calmodulin.mod':

# Ratio of R and T state calcium dissociation constants
Parameter: {
	name => "c",
	value => 3.96e-3,
}
# T-state calcium binding dissociation constants
Parameter: {
	name => "K_TA",
	value => "K_RA/c",
}

Stimulus Class

The stimulus object is used to define a stimulus waveform to apply on a specific node (species) in the reaction network. The species to which the stimulus should be applied can be specified either through the node attribute, or through the structure and state attributes. The node specification takes precedence.

ANC constructs biochemical equations for all stimuli consisting of either a time-varying source or sink or both. I.e. if the species to be controlled is X, then ANC generates one or both of the following biochemical equations:

null -> X;   f=f(t)
X -> null;   g=g(t)

The exact waveforms for f(t) and g(t) depend on the type of stimulus applied. These types are source and impulse, which create a (optionally periodic) source with waveform f(t) having a non-zero value over a specified length of time, sink and wash, which create a (optionally periodic) sink with waveform g(t) having a non-zero value over a specified length of time, clamp, which creates both a source and a sink such that the concentration of X is clamped to a certain value, stair, which is like a clamp but reaches the specified concentration after a specified number of incremental steps and over a specified rise time (and likewise falls back to zero at the end of the stimulus), and ramp, which is like stair except that it reaches the specified concentration by ramping up the concentration in a continuous and linear fashion instead of a series of steps. All of these stimuli can be applied periodically, in which case each waveform is active for a specified number of cycles then takes on a value of 0.

For periodic cycles, we define the duty cycle:

duty = length / period * 100

Note that the stimuli source and impulse are the same thing, as are sink and wash.

Stimulus Object Attributes
Name Allowed values Default Values Description
name any string Prb!xx Name of the stimulus object.
node any string Prb!xx Name (as exported in the Facile file) of the specific species to which the stimulus should be applied.
structure any string undef Name of the structure template (not the instance) of the specific species to which the stimulus should be applied.
state any state undef Specifies the state of the specific species to which the stimulus should be applied.
type impulse | sink | wash | clamp | staircase | ramp | ss_ramp undef Specifies the type of stimulus to be applied.
Waveform attributes (all stimulus types)
period undef or positive real undef Period of the stimulus.
cycles positive integer undef, or 1 for type=stair When a periodic stimulus, number of cycles to apply it.
delay non-negative real 0.0 Delay before the stimulus is applied.
length positive real undef Length of the stimulus, should be less than (period-delay). Not applicable for stair and ramp stimuli.
strength positive real undef Rate constant of the source or sink while it is on.
Waveform attributes (source & impulse)
concentration positive real undef Total amount of X created by the source over during the length of the stimulus.
strength positive real concentration/length Rate constant of the source while it is on.
Waveform attributes (sink & wash)
strength positive real undef Rate constant of the sink while it is on.
Waveform attributes (clamp)
concentration positive real undef Concentration to which the stimulated node is clamped.
strength positive real undef Rate constant of the sink which is used to construct the clamp.
Waveform attributes (staircase & ramp)
concentration positive real undef Concentration to which the stimulated node is clamped when waveform is at its maximum.
strength positive real undef Rate constant of the sink which is used to construct the clamp.
duty positive real 50.0 Duty cycle of the waveform, including rise and fall times.
rftime positive real undef Rise and fall times of the waveform.
steps positive real undef (stair only) Number of discrete steps taken during the rising and falling segments of the waveform.

Probe Class

The Probe class is used to select a set of structure instances of a given class, based on the given criteria.

Probe Object Attributes
Name Allowed values Default Values Description
name any string Prb!xx Name of the probe.
classes one or more defined classes ComplexInstance The classes from which the set of objects in the probe will be selected. This can be the class StructureInstance, or one of its derived classes such as AllostericStructureInstance or ComplexInstance.
filters any list of filtering expressions [] A list of conditions against which each object in the class will be checked.
structure any string undef If defined, this will add an expression to the list of filters whereby the name of the structure template (not the instance) should match the given string.
state any state undef If the structure attribute was defined, then this attribute also constrains the state of the structure instance to the given value.

There are two usual ways to use Probes. For more complex criteria, you can manually specify the class and filtering expressions that should be used. The following is an example of this approach (from $ANC_HOME/examples/adaptor/adaptor_generic.mod). Here, we select all complexes comprising two biomolecules (i.e. dimers), and whose exported name match the given regular expression:

Probe : {
	name => "AY_DIMER",
	classes => ComplexInstance,
	filters => [
 		'$_->get_num_elements() == 2',
 		'$_->get_exported_name() =~ /A.*Y/',
        ],
}

If you just want to probe a structure in a specific state, you can use the structure and stateattributes. If structure is specified and state is not, then the first instance of that structure is selected (normally, the one that was used as a seed). For example, if A is a simple allosteric containing two binding sites:

Probe : {
        structure => A,
        state => '[T,x,x]',
}

Reaction Templates

This section describes the reaction templates which ANC uses to generate reactions during the compilation process.

ANC's reaction templates. The 3 reaction templates are a binding reaction, a catalytic reaction, and an allosteric reaction. Each reaction template is applied to specific node classes and types. The binding reaction template applies to reaction sites of any type (bsite, csite, or msite). The catalytic reaction applies to a pair or csite and msite reaction sites, so 6 combinations are possible as shown. The allosteric reaction template applies to allosteric sites only. In the catalytic reaction template, circle-bar indicates that the node instance's state is flipped. In the allosteric reaction, the state changes from a reference state R to S.

There exists 3 pre-defined reaction templates which give rise to all the biochemical reactions generated during the compilation process. Each reaction template involves the nodes of one or more structures as substrates for the reaction. The nodes must be of the appropriate class given the template. The first template is for for a binding reaction between two ligands. The second template is the product reaction of a a enzymatic reaction, where an enzyme-substrate complex dissociates to give into the enzyme and product. Together, these two templates can implement the Michaelis-Menten mechanism for enzymatic reactions. Finally, there is also a third template for allosteric transitions. These templates generate specific reactions according to the substrates involved and the appropriate rules.

Reaction templates do not define the stoichiometry of a reaction, because they are a template for what happens to the graph of the structures involved in the reaction. As such, the reaction templates are really graph operators. As an operator they prescribe how to combine and change the graphs associated with the substrates to produce graphs of the products of a reaction. They are described graphically, as shown in the figure below, by illustrating what happens to the nodes involved in the reaction in terms of edges are added or deleted, or state changes.

Reaction templates are classified as unary or binary according to how many nodes they involve as reactants. This is not the same as the order of the reaction which a template gives rise to. Indeed, the reaction order is defined by the stoichiometry of the reaction generated from the template. Therefore, a binary template such as the binding reaction may give rise to a 1st-order reaction, if the two reaction sites involved are part of the same structure.

Binary Reactions

A binary reaction has a template which involves two reaction sites as substrates. However, the generated reaction may be 1st order (unimolecular) or 2nd order (bimolecular). If the latter, for example the reaction occurring when two monomers dimerize, the reaction is "external". If the former, for example when the dimer forms an second, additional bond between its monomers, then this is an "internal" reaction.

It does not suffice for putative reactant nodes to be of the correct type for a binary reaction template to be applied. Rather, there must exist an enabling rule which specifies when the template may be applied. This rule is given in the model after structures have been defined, and gives i) the name of the reaction sites ii) their required state iii) any additional ad-hoc conditions which the user may wish to specify, for example to implement regulatory relationships. The rule also specifies the applicable kinetic parameters such as binding constants and product formation rates.

Binding Reactions

This binary reaction template embodies a reversible binding reaction. The template involves a pair of reaction site nodes, and generates either a 2nd order (external) reaction or a 1st-order (internal) reaction as appropriate. The product is a new structure whose graph is the concatenation of the two substrates, with a new edge to represent the bond. Only reaction sites which are not already bound can be substrates. The template is applied only if a CanBindRule exists that specifies that the pair of reaction sites can legally bind to one another. The generated reaction then takes one of the following forms:

X + Y <-> X_Y   # external binding reaction
    W <-> W'    # internal binding reaction

Catalytic Reactions

This binary template implements product formation step of the Michaelis-Menten mechanism for enzyme reactions. The Michaelis-Menten mechanism consists of a reversible binding step (the binding reaction), followed by a product formation step (shown in bold) as the activated enzyme-product complex dissociates into the intact enzyme and modified substrate:

<red>E + S <-> C -> S*  </red>

The catalytic reaction template is applied to a complex when two reaction sites within the complex, one a csite and one an msite, are joined by a non-covalent bond and when there exists an enabling CanBindRule enabling the interaction and giving a product formation rate kp. The catalytic reaction template is usually associated with the binding reaction also enabled by the CanBindRule to yield a complete enzymatic reaction, but need not be.

Unary Reactions

A unary reaction has a template which involves only one reaction site as a substrate. At present, only allosteric reactions fall into this category.

Allosteric Reactions

This unary, 1st order (or internal) allosteric reaction template models conformational changes (allosteric transitions) in proteins and does not require an enabling rule. ANC applies the template to each allosteric node in a given structure. In the reaction, the state of the allosteric node transitions between a reference state R, and a non-reference state T. Allosteric transition rates are supplied by the modeller for a reference (unligated, unmodified) state of the structure, through an attribute of the allosteric node. Thus, in an allosteric structure X containing a single allosteric node the template generates to the following reaction:

X_R <-> X_S

Allosteric transition rate constants vary depending on the context in which the allosteric transition takes place. The method used to calculate them is described in detail in the following article:

Ollivier JF, Shahrezaei V, Swain PS, "Scalable Rule-based Modelling of Allosteric Proteins and Biochemical Networks", 2010.

Other Reactions

In the future, the following reaction templates may be added to ANC:

  • degradation (E + A <-> C -> E)
  • synthesis (R -> R + P)
  • generic reactions...

Reaction Degeneracy

Repeated applications of a reaction template can yield the same biochemical reaction when a structure (e.g. a homodimer) has multiple and indistinguishable reaction sites. These reactions are degenerate, in the sense that they occur multiple times with the same substrates and products. Both association and dissociation reactions can be degenerate.

Association Degeneracy

This type of degeneracy occurs when two ligands can associate in multiple ways while giving the identical product. This arises in situations where one of the ligands has some internal symmetry, just like in our receptor example above. There, the receptor dimer R-R can associate with L to give either L-R-R or R-R-L, which are in fact the same by isomorphism. However, there is only one way that L-R-R can dissassociate to produce the product R-R and L, and hence the reverse reaction occurs only once.

Dissociation Degeneracy

It is possible to have a situation where the dissociation is degenerate but not the association. For example, consider a protein S which can bind to itself via two reaction sites. If one starts with the one-bond dimer S-S, then we can write the association reaction for the formation of the second bond as S-S <-> S=S. Now, the dissociation reaction is degenerate, because either of the two bonds can break, to form the single-bond dimer S-S.

Reaction Kinetics

When a binding rule specifies that reaction sites A and B bind, then the reversible binding reaction template will be applied to any pair of complexes containing the A and B sites, and the relevant rate constants kf, kb (and kp for a catalytic reaction) are given by the rule.

The forward rate constant kf given in a CanBindRule is, by definition, the rate constant for a 2nd-order hetero-dimerization reaction between two proteins each containing one of the reaction sites of the pair specified in the binding rule.

If reaction sites A and B bind with a rate constant kf as specified by a rule in the model, unless otherwise specified they do so independently which protein (or protein complex) they are a part of.

At present all reactions are also assumed to be reaction-limited, and so the well-mixed assumption applies.

Inter-Complex Association Rates

Heterodimerization

For an application of the binding reaction template giving rise to a heterodimerization, the rate constants given by the appropriate rule apply directly without modification. For example, if a template with rate constants kf=10.0 and kb=1.0 is applied to reaction sites Xi of the protein (or complex) X and Yj of protein (or complex) Y, then the generated reaction is:

X + Y <-> X_Y;   kf=10.0;  kb=1.0

By the law of mass-action, the reaction velocities are:

vf = kf * X * Y = 10.0 * X * Y
vb = kb * X_Y

Homodimerization

The rate constants for a heterodimerization and a homodimerization are related, but are not in fact the same. For example, if it is known that a receptor R dimerizes with a rate constant kf, then the heterodimerization of R and a phosphorylated form of R called Rp will occur with a rate constant of 2*kf (assuming the phosphorylation does not influence the kinetics of the reaction). This is demonstrated in the "Interpreting Results" section of the user manual.

Therefore, whenever a binding rule gives rise to a homodimerization, the applicable rate constant is the kf given by the rule divided by two. Hence, supposing Y in the above example is simply a relabeled version of X, and therefore has exactly the same structure and components as X so that the same binding rule generates both homodimers as well as heterodimers, then the following reaction will be generated:

X + X <-> X_X;    kf=5.0;  kb=1.0

The reaction velocities are given by

vf = kf * X^2 = 5.0 * X^2
vb = kb * X_X

and similarly for homodimerization of Y.

Intra-Complex Association Rates

As alluded to previously, the reaction rate constants are computed differently when the reactants are present in the same signalling complex (e.g. recruited by adapter proteins and/or scaffolds). If the associating reaction sites are already co-localized in a signalling complex, the reaction becomes a first-order reaction and the given intra-complex reaction rate must be adjusted to account for the increased effective local concentration of the reactants. The rate used is given by:

internal_kf = (kf * steric_factor) # in s^-1

Therefore, given the same proteins X and Y as above, but after recruitment by a scaffold or adapter S, and for a steric_factor of 0.001, we get the following reaction:

X_S_Y_i00 <-> X_S_Y_i01;   internal_kf=0.01;  kb=1.0

The reaction velocities are given by

vf = internal_kf * [X_S_Y] = 0.01 * [X_S_Y]
vb = kb * [X_S_Y] = 1.0 * [X_S_Y]

Internal reaction rates are discussed in detail here.

[NOTE: for internal associations, the homo-dimerization versus hetero-dimerization distinction is not applicable]

Diffusion

ANC has not currently implemented a model for diffusion, and reaction kinetics operate under a "well-mixed" assumption.

Structure Templates vs Instances

A structure template is a template for the generation of biochemical species. Instantiating a structure, to create a structure instance or species, requires specifying a specific state and initial concentration for the structure. Thus, the structure represents the information common to all variants of a species independent of a particular state, such as the number of binding sites it has, allosteric couplings, tertiary and quaternary structure, etc., while a species is derived by specifying the state information that may change dynamically as a system evolves.

Addressing Structure Components

All components of a structure can be addressed, something which may be useful when writing ad hoc interaction rules. The components listed as elements of a structure are assigned an index starting from 0. If the structure has a hierarchical node, its index is -1. For example, in the simple structure S0 defined above, the hierarchical component S0 has index -1, the component N1 has index 0, and N2 has index 1.

The addressing scheme for the components of a hierarchical structure is similar. Each component is addressed by a set of one or more indices indices, according to their position in the elements list, with the number of indices determined by the nesting level. For example, in the structure T0 above the nested structure S0 has address (0) and S1 has address (1). For the components contained in S0 and S1, we use an additional index. Thus, node N1 has address (0,0) node N3 has address (1,0) and the two instances of node N2 have addresses (0,1) and (1,1). We can also address the hierarchical nodes corresponding to each structure with the index -1. Thus, (1,-1) is the index of hierarchical node of nested structure S1.

Internal Graph Representation of Structures

N.B. This section is inaccurate needs major revision.

In the process of initializing and processing a model, ANC internally creates several different graphs to represent macromolecular complexes and biomolecular structures. These are called the primary (though a better name would be grouped), ungrouped, scalar, and canonical versions of the graph.

Recall that structures are built up hierarchically, with each structure having an optional hierarchical (or group) node, and a set of elements (which can be interaction sites or other structures) that it contains. When ANC initializes the graph for a complex, it only creates the first layer of this hierarchy. This is the primary graph, which will show only the hierarchical node of the structure (if there is one) and the interaction sites and hierarchical nodes immediately below.

Hence, in the primary graph representation of a macromolecular complex (which has no hierarchical node), the nodes of the graph represent biomolecules. In this graph, the edges must be labelled (or "coloured") according to which components they connect, to distinguish between edges going between specific components within the interacting biomolecules.

This edge colour is derived as follows. First, each sub-structure and interaction site in a structure is assigned a hierarchical address according to its position when listed as elements. Thus, in a protein with multiple domains each comprising one or more reaction sites, the 2nd site in the 3rd domain has address "2.1". If it connects to the first site of the first domain, then the edge colour is "0.0:2.1".

Bi-directional edges are represented in the graph using dual uni-directional edges. So in the above example, the two edge colours are "0.0:2.1" and "2.1:0.0". The edge colour is always encoded as "tail_node_address:head_node_address", referring to the arrowhead and arrowtail.

[NOTE: include an example graph here]

Using a hierarchical address to colour the edges of a the primary graph of a structure implies that any symmetry within a structure becomes invisible to the graph alignment process which occurs during network generation and whose purpose is to detect identical structures. For example, consider a single-domain protein with two identical binding sites. Then, the two graphs corresponding to the complexes where i) the first site has bound a ligand, and ii) the second site is bound, are not considered isomorphic. In effect, all components of a primary graph are distinguishable, and any symmetries are broken.

In such a case, ANC allows the user to specify that the protein should be ungrouped. An ungrouping operation replaces a hierarchical node (such as a protein) with a grouping node and with a set of nodes corresponding to the underlying elements. The grouping node connects to each element via a unidirectional edge whose colour represents the containment relationship between the hierarchical node and its elements. If the element nodes are themselves hierarchical, they can be recursively ungrouped as well.

Thus, if the single-domain, symmetric protein in the example above is ungrouped twice, the original node is replaced with 4 nodes: two grouping nodes representing the protein and the single domain (connected by a uni-directional grouping edge), and two nodes representing each reaction site. The reaction site nodes are each connected to the domain grouping node via a grouping edge.

Now, when a ligand binds to one of the reaction sites, the resulting graph has an edge going from the ligand directly to the site node. The colour of the edge is no longer a function of the site's address, and regardless of which reaction site the ligand binds to, the same graph results.

[INCLUDE A DIAGRAM]

[NOTE: the term flattening means complete, iterative ungrouping of a structure. I.e. ungroup only operates on 1 level. Clarify how ungroup_flag propagates down hierarchy]

[NOTE: chemically speaking, a single-subunit protein cannot be symmetrical since each binding site has a different position in the polymer, and so cannot be chemically equivalent. This can be represented by chaining reaction sites together in the ungroup graph using an edge whose colour represents the covalent, directional bond.]

[NOTE: symmetric proteins like hemoglobin are in fact multi-subunit proteins comprising several distinct individual proteins.]

ANC Programmer's Reference Manual

This manual is a stub. To be completed!

ANC Software Architecture Overview

The following sections describe the architeture and object classes used in ANC.

Class Inheritance Hierarchy

PARENT CLASSES

INSTANCE CLASSES

Class Containment Hierarchy

TBD

Class Design and Implementation

Base Classes

These classes are in the base sub-directory.

Main Classes

These classes are in the modules sub-directory.