Introduction

PSNTools is a easy-to-use command line software for calculating and analyzing Protein Structure Networks (PSN) from both single structures and molecular dynamics trajectories for high throughput investigation of allosterism in biological systems. PSNTools employs a mixed strategy integrating PSN and the correlations of the atomic fluctuations to investigate the structural communication in proteins and nucleic acids.

The PSN analysis proved as valuable tools in a number of studies and PSNTools provides the user with a easy to use and an immediate feedback through easily accessible data files and graphical visualizations of the output. Automation, high speed and the broad range of available network analyses make this software suitable for high throughput investigation of the communication pathways in large sets of biomolecular systems in different functional states.

Despite the large number of provided options and alternative algorithms PSNTools will automatically set benchmark proved values tested against experimental data. For more details about published papers and the theory behind PSN analysis, please visit WebPSN webserver.

Finally, PSNTools has a periodically updated internal database of more than 30,000 pre-calculated network parameters for ions and other molecules present in all PDB structures.


Installation and Requirements

PSNTools is free and open to all users and is distributed as a statically compiled binary executable with no dependencies (recommended) and as source code which can be compiled on other platforms. Both the binary and the source code can be downloaded at the website of WebPSN server.

The statically compiled binary executable has been tested to properly work out of the box using a fresh install of the latest two releases of the following, widely used, Linux distributions: Ubuntu, Fedora and Manjaro.

Compilation can be (very) long due to the large internal database of network parameters and requires a modern C++ compiler and toolchain and the following development libraries: Boost, Cereal, ZLib, Armadillo and BLAS/LAPACK libraries.

PSNTools uses, for some calculation stages, Wordom, another software provided by the same research group. A statically compiled binary executable for Wordom is provided as well. To use PSNTools and Wordom you can download (or compile) the executable files and move them to an appropriate location in your filesystem.

Finally, this is a list of additional recommended software that, while not strictly necessary, are however highly recommended: PyMol, VMD, Gnuplot, Graphviz.

Please, refer to your operating system manual and to the website of each software for more details about the installation process:


How to Cite

Thank you for using PSNTools, we really appreciate it.

Please, remember to cite at least one of the following papers in all published works which utilize this software:

NEW_PAPER_CITATION_HERE

Angelo Felline, Michele Seeber and Francesca Fanelli
webPSN v2.0: a webserver to infer fingerprints of structural communication in biomacromolecules
Nucleic Acids Res, Web Server Issue, 19 May 2020
https://doi.org/10.1093/nar/gkaa397


Contacting Us

If you have any questions or if you encounter any problems with this sofware please do not hesitate to contact us at the following email addresses:


PSNTools is copyright 2017-2020 the University of Modena and Reggio Emilia (Italy).

PSNTools is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

PSNTools is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with Wordom. If not, see http://www.gnu.org/licenses

PSN Calculation


Calculates a Protein Structure Network from a single PDB file or a Gromacs (xtc) or CHARMM (dcd) trajectory file. For detailed information on the theory behind this step, read the relevant section in the Appendices chapter.

Table 1. Command options and descriptions
Option Description

-calc

Must be the first option passed to psntools. Required.

-pdb

A pdb file, required.

-trj

A CHARMM dcd file or a gromacs xtc file. Optional.

-labfile

A labels file, if passed the nodes in the resulting network will be labeled according to this file. Please note that, if you pass labels at this stage, they will be permanently assigned to the resulting network. If you think they might change in the future is a better option to pass them later during the analysis stage. Please see Labels File section for more details. Optional.

-name

Will be used as network name, if not provided a name will be derived from the pdb file. Optional.

-sele

A string used to select only a portion of passed pdb/trajectory file. Please see Selection Syntax section for more details. Optional.

-calctrjstat

This option accepts either yes (default) or no strings. If yes is passed a large number of network statistics will be calculated for each trajectory frame. Please note that this option can be very CPU, disk and time consuming for large systems and/or long trajectories. If you choose to use it, please consider to use -tnum and -trjstatsampling options as well, passing a large value to the former and a small one for the latter. Optional.

-trjstatsampling

With this option you can set the fraction of trajectory frames used to calculate trajectory statistics (see -calctrjstat option). This option accepts a value > 0 and ≤ 1. This option is automatically set to 1 for psn calculated over a single pdb and the default is 3 for those calculated from a trajectory file. Optional.

-tnum

This option set the number of threads used to speedup PSN calculation. If you are working on a multi-cores workstation, a large value is advisable. 1 by default. Optional.

-wordom

If you installed wordom (see Installation and Requirements section) in a directory not in your path (please refer to your operating system manual) you can use this option to set the full disk path to wordom executable file. Optional.

-corrfile

Used to pass a pre-calculated correlations file skipping the automatic calculation of the atomic fluctuations. Optional.

-ignoreselfint

Used to ignore intra-chain interactions during network calculation of a trajectory file. This option is useful if passed trajectory file contains solvatation waters or membrane lipids and you want to skip water-water or lipid-lipid interactions while preserving the interaction between your protein and the surrounding environment.

-ignoreselfint W

will ignore all interactions between two nodes in chain W. Optional.

-param

Used to bypass the internal database of network parameters and manually set the value for one or more non standard aminoacids molecules present in passed pdb file. Passed string must adhere to the following syntax:

lig_name1:lig_param1,lig_name2:lig_param2,…​,lig_nameN:lig_paramN

where lig_name is a residue identifier similar to those used in labels file and lig_param is a numerical value > 0 as in the following example:

-param A:A:RET1,163.4,B:B:GTP1,249.73

This is optional and its use is discouraged.

-paramdb

Used to pass an external database of network parameters, bypassing the internal one. This is optional and its use is discouraged.


Some examples:
psntools -calc -pdb 3lnx.pdb

psntools -calc -pdb 3lnx.pdb -name pdz

psntools -calc -pdb 3lnx.pdb -trj 3lnx.dcd -tnum 5

psntools -calc -pdb 3lnx.pdb -labfile pdz.lab

psntools -calc -pdb 3lnx.pdb -sele A/A/*


Table 2. Output files and descriptions
Output Description

.psn

This binary file is always produced and stores all the information needed to perform other network analyses.

Check For Network Parameters


PSNTools uses the residue type field (4th column) of passed pdb file to assign the correct network parameter to each node during PSN calculation. These three-letters codes are (almost) standardized, nonetheless some molecular simulation programs, like CHARMM or Gromacs, use some codes with an alternative meanings, like HSD, HSE and HSP to identify different protonation state for the histidines. It is a good practice, before a PSN calculation, to check for non standard aminoacids/nucleotides present in your pdb file and to see to which molecule and network parameter they are associated to in the internal database.

Table 3. Command options and descriptions
Option Description

-checknf

Must be the first option passed to psntools. Required.

-pdb

A pdb file.

-param

Used to bypass the internal database of network parameters and manually set the value for one or more non standard aminoacids molecules present in passed pdb file. Passed string must adhere to the following syntax:

lig_name1:lig_param1,lig_name2:lig_param2,…​,lig_nameN:lig_paramN

where lig_name is a residue identifier similar to those used in labels file and lig_param is a numerical value > 0 as in the following example:

-param A:A:RET1,163.4,B:B:GTP1,249.73

This is optional and its use is discouraged.

-paramdb

Used to pass an external database of network parameters, bypassing the internal one. This is optional and its use is discouraged.


Examples:
psntools -checknf -pdb 3lnx.pdb


This command will print on screen a table with each non standard aminoacids/nucleotides present in passed pdb and their associated molecule name and formula, as well as, their network parameter. If a molecule is not present in the internal database, it will be parametrized on the fly on the basis of its atomic coordinates.

Network Representation


After the PSN calculation stage, you can use the produced .psn file to generate a network representation called Protein Structure Graph (PSG). Please remember that all optional values listed below will be automatically set for you to a reasonable default value.

Table 4. Command options and descriptions
Option Description

-psg

Must be the first option passed to psntools, followed by a .psn file obtained with the -calc command. Required.

-name

If used, all resulting output files will be named according to the value passed to this option, otherwise the internal network name will be used (defualt). Optional.

-labfile

A labels file, if passed the nodes in the resulting outputs will be labeled according to this file. Please note that in order to properly work you need a unlabeled .psn file (i.e. you need to calculate your .psn file without -labfile option). See -calc command and Labels File section for more details. Optional.

-outval

You can pass a file to this option with external numerical values (i.e. sequence conservation data, experimental values, etc). These values will be reported in the output files alongside network values. These external numerical values file are formatted in the vein of labels files. See External Values File section for more details. Optional.

-pdb

If you are analyzing an consensus network (see -cons command), you can choose which pdb, of the embedded pdb coordinates, must be used to produce a 3D representation of your PSG. By default the pdb coordinates of the first network passed to -cons command will be used. Optional.

-gly

If true (default) glycines will be included in all calculations and representations. Optional.

-freq

This option accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if their frequency (i.e. % of trajectory frames) is ≥ than this value. The default is 50. Optional.

-cons

This option is used only when dealing with a consensus psn file (see -cons command) and accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if they are shared by at least the value passed to this option in the networks used to generate the consensus psn your are analyzing. The default is 0. Optional.

-mergeclust

This option is used to set the clusters merging criterion (read the relevant section in the Appendices chapter for more details) and accepts one of the following values: no, imin, imin2, freq. Optional.

-out3D

With this option you can select the 3D representation output format(s). Accepted values are: pml and vmd to generate a pyMol and a VMD script, respectively. You can select more than one format by concatenating them with a ',' character. Default value is pml,vmd. Optional.

-color

Used to select the coloring criterion of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional

-size

Used to select the criterion behind the size of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional.


Some examples:
psntools -psg 3lnx.psn

psntools -psg 3lnx.psn -labfile pdz.lab

psntools -psg 3lnx.psn -outval res_conserv.dat


Table 5. Output files and descriptions
Output Description

{net_name}_info.csv

This file contains a summary of calculated PSG:

  • NetName: network name

  • PDBFile: the name of the pdb file used by pml and/or vmd files

  • NetFile: the name of analyzed psn file

  • Imin: the Imin value (read the relevant section in the Appendices chapter for more details)

  • LNodes: the total number of nodes with at least one link

  • Links: the total number of links

  • Hubs: the total number of hubs

  • HLinks: the total number of links mediated by at least one hub

  • Communities: the total number of node communities

  • CommNodes: the total number of nodes which belong to a node community

  • CommLinks: the total number of links among nodes which belong to a node community

{net_name}_comms.pml {net_name}_comms.vmd

These scripts display communities of nodes identified in the calculated network. Nodes are represented as spheres centered on the Calpha (for standard aminoacids) or geometric center for all other molecule, while links are represented as sticks that connect two nodes. Each community is represented with a unique color and nodes and links belonging to the same community share the same colors. The colors of the first nine most populous communities can be seen in the appropriate section.

{net_name}_comm.txt

This file contains a detailed summary of all node communities in calculated network. For each community, listed from the larger to the smaller, the following information are provided:

  • Comm: the community number

  • Links: total number of links

  • Nodes: total number of nodes

  • Hubs: total number of hubs

  • HLinks: total number of links mediated by at least one hub

  • L/N Ratio: links to node ratio

  • H/N Ratio: hubs to node ratio

  • HL/N Ratio: links mediated by at least one hub to nodes ratio

  • HL/L Ratio: links mediated by at least one hub to all links ratio

For each community the list of all its links is also provided with the following additional information:

  • N: a progressive number

  • Node1: first node of the link

  • Node2: second node of the link

  • Freq: the link trajectory frequency

  • AvgInt: the average interaction strength of this link along the trajectory

  • IsNode1Hub?: Yes if Node1 is an hub, No otherwise

  • IsNode2Hub?: Yes if Node2 is an hub, No otherwise

{net_name}_corrpairs_dist.png

This plot represent the distribution of the correlations of the atomic fluctuations versus the % of node pairs.

{net_name}_corrpairs_dist.csv

This file contains the data used to produce the plot named {net_name}_corrpairs_dist.png

{net_name}_corrpairs_dist_info.csv

This file contains a table with a number of statistics relative to the correlations of the atomic fluctuations.

  • MinValue: lowest correlation

  • AvgValue: average correlation

  • StDevValue: standard deviation

  • MaxValue: largest correlation

  • MostRecValue: most recurrent value

  • MostRecFreq: the % of node pairs with a correlation equal to the most recurrent value

  • 2ndMostRecValue: the 2nd most recurrent value

  • 2ndMostRecFreq: the % of node pairs with a correlation equal to the 2nd most recurrent value

  • 3rdMostRecValue: the 3rd most recurrent value

  • 3rdMostRecFreq: the % of node pairs with a correlation equal to the 3rd most recurrent value

{net_name}_hlinks.pml {net_name}_hlinks.vmd

These scripts display all the links mediated by at least one hub in the calculated network. Nodes and links color and size are controlled by -color and -size options.

{net_name}_hubs.csv

This file contains a detailed table inherent the hubs in the calculated network with the following information:

  • N: a progressive number

  • Hub: the hub

  • Freq: hub trajectory frequency

  • Force: the average interaction strength of its links

  • Clust: the cluster of nodes this hub belong to

  • Comm: the community of nodes this hub belong to

{net_name}_hubs.pml {net_name}_hubs.vmd

These scripts display all the hubs in the calculated network. The color and size of each hub is controlled by -color and -size options.

{net_name}_links.csv

This file contains a detailed table listing all the links in the calculated network with the following information:

  • N: a progressive number

  • Node1: first node of the link

  • Node2: second node of the link

  • Freq: the link trajectory frequency

  • Force: the average interaction strength of this link along the trajectory

  • IsNode1Hub?: Yes if Node1 is an hub, No otherwise

  • IsNode2Hub?: Yes if Node2 is an hub, No otherwise

  • Clust: the node cluster this link belongs to

  • Comm: the node community this link belongs to

{net_name}_links.pml {net_name}_links.vmd

These scripts display all the links in the calculated network. Nodes and links color and size are controlled by -color and -size options.

{net_name}.pdb

a .pdb file with the coordinates used to calculate the network with the -calc command. Used by .pml and or .vmd 3D scripts.

Network Difference


Network difference is used to highlight the differences between two PSGs in terms of their links, hubs and links mediated by at least one hub. This analysis is particularly useful to identify commonalties and differences in the structural communication of two functionally different states of the same system. Please remember that you need to pass one or two labels files if the two networks have a different primary sequence or even they have the same sequence but different chains and/or segments in the passed pdb files. Please see Labels File section for more details. Please remember that all optional values listed below will be automatically set for you to a reasonable default value.

Table 6. Command options and descriptions
Option Description

-psgdiff

must be the first option passed to psntools. Required.

-psn1 -psn2

these options are used to pass the two .psn files to be analyzed. Required.

-name

If used, all resulting output files will be named according to the value passed to this option, otherwise "psndiff_" will be used (defualt). Optional.

-pdb1 -pdb2

If one or both networks are consensus network (see -cons command), you can choose which pdb, of the embedded pdb coordinates, must be used to produce a 3D representations. By default the pdb coordinates of the first network passed to -cons command will be used. These are both optional and you can use only one of them.

-labfile -labfile1 -labfile2

Used to pass one or two labels file. If the two networks have a different primary sequence or even if they have the same sequence but their pdb files have different chains and/or segments you need to use this option. -labfile1 is used to select the labels file to be applied to the .psn file passed to -psn1 option, while -labfile2 is used for the .psn file passed to -psn2. Additionally, you can use -labfile option to apply the same labels to both network. Please note that in order to properly work you need a unlabeled .psn file (i.e. you need to calculate your .psn file without -labfile option). See -calc command and Labels File section for more details. Optional.

-outfile -outfile1 -outfile2

You can pass a file to this option with external numerical values (i.e. sequence conservation data, experimental values, etc). These values will be reported in the output files alongside network values. These external numerical values file are formatted in the vein of labels files. See External Values File section for more details. As for the previous option, you can freely choose to pass one or two file or to select the same file for both networks. Optional.

-freq
-freq1
-freq2

These options accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if their frequency (i.e. % of trajectory frames) is ≥ than this value. The default is value is 50. As for the previous option, you can freely choose to pass the same value for both networks (-freq) or you can select two values using -freq1 and -freq2 options. Default values are 50. Optional.

-cons
-cons1
-cons2

This option is used only when dealing with a consensus psn files (see -cons command) and accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if they are shared by at least the value passed to this option in the networks used to generate the consensus psn your are analyzing. As for the previous option, you can freely choose to pass the same value for both networks (-cons) or you can select two values using -cons1 and -cons2 options. Default values are 0. Optional.

-pert

This option is used to select the minimum perturbation to be reported in the output files. Default is 0. Perturbation is calculated, independently, to both hubs and links. If both networks were calculated from a trajectory file, the perturbation is calculated from the trajectory frequency, the interaction strength will be used is one or both network derive from a single pdb file.

pert eq

where Net1 and Net2 are the trajectory frequency/interaction strength of any given hub or link.


Optional.

-mergeclust
-mergeclust1
-mergeclust2

This option is used to set the clusters merging criterion (read the relevant section in the Appendices chapter for more details) and accepts one of the following values: no, imin, imin2, freq. As for the previous option, you can set the same value for both networks (via -mergeclust option) or two different values using -mergeclust1 and -mergeclust2. Optional.

-gly
-gly1
-gly2

If true (default) glycines will be included in all calculations and representations. -gly apply to both networks, while with -gly1 and -gly2 you can control this option independently. Optional.


Some examples:
psntools -psgdiff -psn1 3lnx.psn -psn2 3lny.psn -labfile pdz.lab
Table 7. Output files and descriptions
Output Description

{net_name}_info.csv

This file contains a summary of calculated network difference:

  • Freq: the values passed to -freq1 and -freq2 oprions

  • Imin: the Imin value (read the relevant section in the Appendices chapter for more details) of each network

  • LNodes: the number of nodes with at least one link in each network

  • Links: the number of links in each network

  • Hubs: the number of hubs in each network

  • HLinks: the number of links mediated by at least one hub in each network

  • SpecLinks, SpecLinks%: the number and the % of links present in only one of the two networks

  • SharedLinks, SharedLinks%: the number and the % of links shared by both networks

  • SpecNodes, SpecNodes%: the number and the % of nodes with at least one link present in only one of the two networks

  • SharedNodes, SharedNodes%: the number and the % of nodes with at least one link shared by both networks

  • SpecHubs, SpecHubs%: the number and the % of hubs present in only one of the two networks

  • SharedHubs, SharedHubs%: the number and the % of hubs shared by both networks

{net_name}_corrpairs_dist.png

This plot is a graphical comparison between the two analyzed networks of the distributions of the correlations of the atomic fluctuations versus the % of node pairs.

{net_name}_corrpairs_dist.csv

This file contains the data used to produce the plot named {net_name}_corrpairs_dist.png

{net_name}_corrpairs_dist_info.csv

This file contains a table with a number of statistics relative to the correlations of the atomic fluctuations.

  • MinValue: lowest correlation in each network

  • AvgValue: average correlation in each network

  • StDevValue: standard deviation in each network

  • MaxValue: largest correlation in each network

  • MostRecValue: most recurrent value in each network

  • MostRecFreq: the % of node pairs with a correlation equal to the most recurrent value in each network

  • 2ndMostRecValue: the 2nd most recurrent value in each network

  • 2ndMostRecFreq: the % of node pairs with a correlation equal to the 2nd most recurrent value in each network

  • 3rdMostRecValue: the 3rd most recurrent value in each network

  • 3rdMostRecFreq: the % of node pairs with a correlation equal to the 3rd most recurrent value in each network

{diff_name}_net1_all_perthubs_in_net2.pml {diff_name}_net1_all_perthubs_in_net2.vmd {diff_name}_net1_neg_perthubs_in_net2.pml {diff_name}_net1_neg_perthubs_in_net2.vmd {diff_name}_net1_pos_perthubs_in_net2.pml {diff_name}_net1_pos_perthubs_in_net2.vmd {diff_name}_net2_all_perthubs_in_net1.pml {diff_name}_net2_all_perthubs_in_net1.vmd {diff_name}_net2_neg_perthubs_in_net1.pml {diff_name}_net2_neg_perthubs_in_net1.vmd {diff_name}_net2_pos_perthubs_in_net1.pml {diff_name}_net2_pos_perthubs_in_net1.vmd

All these 3D script files can be used to visualize the perturbation of each hub in the two analyzed networks. The perturbation is calculated using the equation described in the description of the -pert option. Those files with the _pos_ in their name, show only those hubs with a perturbation > 0, while those files with _neg_ in their name, show only those hubs with a perturbation < 0. Finally, those files with _all_ in their name, show both, negative and positive perturbed hubs.

{diff_name}_net1_all_pertlinks_in_net2.pml {diff_name}_net1_all_pertlinks_in_net2.vmd {diff_name}_net1_neg_pertlinks_in_net2.pml {diff_name}_net1_neg_pertlinks_in_net2.vmd {diff_name}_net1_pos_pertlinks_in_net2.pml {diff_name}_net1_pos_pertlinks_in_net2.vmd {diff_name}_net2_all_pertlinks_in_net1.pml {diff_name}_net2_all_pertlinks_in_net1.vmd {diff_name}_net2_neg_pertlinks_in_net1.pml {diff_name}_net2_neg_pertlinks_in_net1.vmd {diff_name}_net2_pos_pertlinks_in_net1.pml {diff_name}_net2_pos_pertlinks_in_net1.vmd

All these 3D script files can be used to visualize the perturbation of each link in the two analyzed networks. The perturbation is calculated using the equation described in the description of the -pert option. Those files with the _pos_ in their name, show only those links with a perturbation > 0, while those files with _neg_ in their name, show only those links with a perturbation < 0. Finally, those files with _all_ in their name, show both, negative and positive perturbed links.

{diff_name}_hlinks_hist.png

This histogram show the total number of links mediated by at least one hub in both networks.

{diff_name}_hubs_hist.png

This histogram shows the total number of hubs in both networks.

{diff_name}_hubs_tab.csv

This table summarize the differences about shared and common hubs between the two networks.

  • N: a progressive number

  • Hub: an hub residue

  • Owner: this column specifies the name of the network in which this hub is present, or Shared if present in both networks

  • Degree1, Degree2: the number of links mediated by this hub in the first and second network

  • Freq1, Freq2: the trajectory frequency of this hub in the first and second network, (always 100 if the network was calculated from a single pdb)

  • Force1, Force2: the average interaction strength of the links mediated by this hub in the first and second network

{diff_name}_links_hist.png

This histogram shows the total number of links in both networks.

{diff_name}_links_tab.csv

This table summarize the differences about shared and common hubs between the two networks.

  • N: a progressive number

  • Node1/2: first and second node of the link

  • Owner: this column specifies the name of the network in which this link is present, or Shared if present in both networks

  • Freq1, Freq2: the trajectory frequency of this link in the first and second network (always 100 if the network was calculated from a single pdb)

  • Force1, Force2: interaction strength of this links in the first and second network

  • {net_name1}OutVal1/2: the external value associated to the first/second node of this link in the external values file passed to the first network or 0 if no file was passed or this node is not present in passed file

  • {net_name2}OutVal1/2: the external value associated to the first/second node of this link in the external values file passed to the second network or 0 if no file was passed or this node is not present in passed file

  • IsNode1HubInNet1/2: Yes if the first/second node of this link is an hub in the first network, No otherwise

  • IsNode2HubInNet1/2: Yes if the first/second node of this link is an hub in the second network, No otherwise

{diff_name}_lnodes_hist.png

This histogram shows the total number of nodes with at least one link in both networks.

{diff_name}_net1_vs_net2_hubsdiff.pml {diff_name}_net1_vs_net2_hubsdiff.vmd

These 3D scripts compare the different hubs present in both networks. Hubs present only in the first network are represented as orange spheres, while those specific of the second network are represented in purple. Finally, shared hubs are presented as green spheres.

{diff_name}_net2_vs_net1_hubsdiff.pml {diff_name}_net2_vs_net1_hubsdiff.vmd

These 3D scripts use the same representation criteria of the previous files but with inverted networks and then inverted colors (i.e. net2 vs net1 instead of net1 vs net2).

{diff_name}_net1_vs_net2_linksdiff.pml {diff_name}_net1_vs_net2_linksdiff.vmd

These 3D scripts compare the different links present in both networks. As in the previous file, links specific of the first, second and those links share by the two networks are represented in orange, purple and green. Nodes linked by these links follow the same color scheme.

{diff_name}_net2_vs_net1_linksdiff.pml {diff_name}_net2_vs_net1_linksdiff.vmd

These 3D scripts use the same representation criteria of the previous files but with inverted networks and then inverted colors (i.e. net2 vs net1 instead of net1 vs net2).

{diff_name}_perthubs_tab.csv {diff_name}_pertlinks_tab.csv

These tables summarize the perturbation of each hub/link in both networks using the equation defined in the description of the -pert option. These tables share the very same structure and only the first one is described here.

  • N: a progressive number

  • Hub: the hub being considered

  • Owner: this column specifies the name of the network in which this hub is present, or Shared if present in both networks

  • Freq1/Force1: the trajectory frequency/average interaction strength of this hub in the first network

  • Freq2/Force2: the trajectory frequency/average interaction strength of this hub in the second network

  • Delta1: the perturbation of this hub in the second network compared to the first

  • Delta2: the perturbation of this hub in the first network compared to the second

{diff_name}_spec_comm_hubs.png {diff_name}_spec_comm_links.png {diff_name}_spec_comm_lnodes.png

These histograms, show the number of specific and shared hubs, links and linked nodes, respectively.


Shortest Communication Paths


After the PSN calculation stage, you can use the produced .psn file to calculate the correlated shortest communication path(s) between two or more nodes. As for the previous commands, remember that all optional values listed below will be automatically set for you to a reasonable default value.

Table 8. Command options and descriptions
Option Description

-paths

Must be the first option passed to psntools, followed by a .psn file obtained with the -calc command. Required.

-name

If used, all resulting output files will be named according to the value passed to this option, otherwise the internal network name will be used (defualt). Optional.

-labfile

A labels file, if passed the nodes in the resulting outputs will be labeled according to this file. Please note that in order to properly work you need a unlabeled .psn file (i.e. you need to calculate your .psn file without -labfile option). See -calc command and Labels File section for more details. Optional.

-outval

You can pass a file to this option with external numerical values (i.e. sequence conservation data, experimental values, etc). These values will be reported in the output files alongside network values. These external numerical values file are formatted in the vein of labels files. See External Values File section for more details. Optional.

-pdb

If you are analyzing an consensus network (see -cons command), you can choose which pdb, of the embedded pdb coordinates, must be used to produce a 3D representation of your PSG. By default the pdb coordinates of the first network passed to -cons command will be used. Optional.

-gly

If true (default) glycines will be included in all calculations and representations. Optional.

-freq

This option accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if their frequency (i.e. % of trajectory frames) is ≥ than this value. The default is 50. Optional.

-cons

This option is used only when dealing with a consensus psn file (see -cons command) and accepts a numerical value > 0 and ≤ 100. Links and Hubs (read the relevant section in the Appendices chapter for more details) will be considered only if they are shared by at least the value passed to this option in the networks used to generate the consensus psn your are analyzing. The default is 0. Optional.

-noifile

This option is used to pass a file with a list of nodes of interest (one per line) that will be reported in the output files. Optional.

-noi

This option accepts a numerical value > 0 and ≤ 100 and is used to filter out those shortest paths with a % of nodes of interest (see -noifile) smaller the the value passed to this option. Default is 0. Optional.

-mergeclust

This option is used to set the clusters merging criterion (read the relevant section in the Appendices chapter for more details) and accepts one of the following values: no, imin, imin2, freq. Optional.

-mpselmode

This option is used, together with -rec, to select which links and nodes will be present in the resulting metapath. Please, read the relevant section in the Appendices chapter for more details. Valid values and their description:

  • link: only those links with a recurrence ≥ to the value passed to -rec option will be present in the resulting metapath

  • node: only the links of nodes with a recurrence ≥ to the value passed to -rec option will be present in the resulting metapath

  • both: both nodes and links must have a recurrence ≥ to the value passed to -rec option to be present in the resulting metapath

The default value is link. Optional.

-rec

This option accepts a numerical value > 0 and ≤ 100 and is used to set the minimum metapath recurrence used by -mpselemode option. The default is 10 and 20 for networks calculated from a pdb or a trajectory file, respectively. Optional.

-corr

This option accepts a numerical value > 0 and ≤ 1 and is used to set the minimum correlation of the atomic fluctuations. This option, combined with -corrmode, is used to filter out shortest paths with poor correlation. The default is 0.7 and 0.8 for networks calculated from a pdb or a trajectory file, respectively. Optional.

-corrmode

This option is used to select the correlation filtering criterion. Valid values and their description are:

  • 11t: at least one node in each shortest path must have a correlation ≥ to the value passed to -corr option with at least one of the two apical nodes to be considered (default)

  • 12t: at least one node in each shortest path must have a correlation ≥ to the value passed to -corr option with both apical nodes to be considered (default)

  • a1t: all non-apical nodes in each shortest path must have a correlation ≥ to the value passed to -corr option with at least one of the two apical nodes to be considered (default)

  • a2t: all non-apical nodes in each shortest path must have a correlation ≥ to the value passed to -corr option with both apical nodes to be considered (default)

  • lnk: the nodes of all links in each shortest path must have a correlation ≥ to the value passed to -corr option

  • all: all possible node pairs in each shortest path must have a correlation ≥ to the value passed to -corr option

  • pre: each node in each shortest path must have a correlation ≥ to the value passed to -corr option with the preceding node

-weighttype

This option is used to select the weight of each link during the shortest paths calculation. If no is passed (default) all links will have the same weight, and the paths composed by the smaller number of nodes will be selected for each pair. Optional. . Other valid values and their description are:

  • force: for each node pair, select the paths that maximize the average interaction strength of their links

  • freq: for each node pair, select the paths that maximize the trajectory frequency of their links

  • cons: for each node pair, select the paths that maximize the average conservation (see -cons) of their links

  • corr: for each node pair, select the paths that maximize the correlation of their links

  • forcecorr: for each node pair, select the paths that maximize both the average interaction strength and the correlation of their links

  • freqcorr: for each node pair, select the paths that maximize both the average trajectory frequency and the correlation of their links

-mptype

This option set the equation used to calculate the recurrence score associated to each link and node in the filtered pool of shortest paths. Default is realrec. Optional. Valid values and their description:

  • realrec: the recurrence score is calculated as the fraction shortest paths in which a given a node or link is present over the total number of filtered paths

  • relrec: like the previous one but also scaled to the highest recurrence value (default)

  • betcent: the recurrence score is the betweenness centrality

  • clscent: the recurrence score is the closeness centrality

-pass

Select only those shortest paths that pass through a given node.

-tail

Select only those shortest paths that start or end to a given node.

-pair

Calculate the shortest paths between a pair of nodes.

-pairsfile

Calculates all shortest paths between the nodes listed in passed file.

-hubs

This option accepts a numerical value ≥ 0 and ≤ 100, and is used to select only those shortest paths with a given % of hubs. Default is 0. Optional.

-corrscore

This option accepts a numerical value ≥ 0 and ≤ 100, and is used to select only those shortest paths with a given % of correlated nodes. Default is 0. Optional.

-writeptable

Used to select whether a file with all shortest paths must be generated. Valid values are: yes, no, auto (default). With auto this file is generated only if the network has less than 700 nodes. Optional.

-tnum

This option set the number of threads used to speedup shortest paths calculation. If you are working on a multi-cores workstation, a large value is advisable. 1 by default. Optional.

-out3D

With this option you can select the 3D representation output format(s). Accepted values are: pml and vmd to generate a pyMol and a VMD script, respectively. You can select more than one format by concatenating them with a ',' character. Default value is pml,vmd. Optional.

-mp2d

If yes a 2D representation of the computed metapath will be produced. Default is no. Optional.

-color

Used to select the coloring criterion of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional

-size

Used to select the criterion behind the size of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional.


Some examples:
psntools -paths 3lnx.psn

psntools -paths 3lnx.psn -pass R:R:L29

psntools -paths 3lnx.psn -tail R:R:L29

psntools -paths 3lnx.psn -pair R:R:L29,R:R:W99


Table 9. Output files and descriptions
Output Description

{name}_paths.log

This file summarizes the value of all used options in the performed shortest paths calculation.

{name}_pathstable_info.csv

This file reports a series of statistics about the filtered shortest paths pool.

  • NetName: the name of analyzed network

  • MinCorr: the value passed to -corr option

  • MinRec: the value passed to -rec option

  • NumOfPaths: the total number of filtered shortest paths

  • MinPathsLength, AvgPathsLength StDPathsLength MaxPathsLength: the shortest, the longest, the average and the corresponding standard deviation of the length of paths expressed as number of nodes in the shortest path pool.

  • MinPathsForce AvgPathsForce StDPathsForce MaxPathsForce: the lowest, the highest, the average and the corresponding standard deviation of the average interaction strength among the links present in the shortest path pool.

  • MinPathsCorr AvgPathsCorr StDPathsCorr MaxPathsCorr: the lowest, the highest, the average and the corresponding standard deviation of the average correlation between the each node and the first and last nodes in each path of the shortest path pool.

  • MinPathsScore AvgPathsScore StdPathsScore MaxPathsScore: the lowest, the highest, the average and the corresponding standard deviation of the percentage of correlated nodes as defibed by -corrmode and -corr options, present in the shortest path pool.

  • MinPathsHubs% AvgPathsHubs% StdPathsHubs% MaxPathsHubs%: the lowest, the highest, the average and the corresponding standard deviation of the percentage of hub nodes present in the shortest path pool.

{name}_metapath.csv {name}_metapath_alllinks.csv

These files report a series of information about all the links present in the filtered shortest paths pool. The former file lists only those links which respect the filtering criteria of -mpselemode and -rec options, while the latter lists all links.

  • N: a progressive number

  • Node1, *Node2: the first and second node of this link

  • OutVal1 OutVal2: the external values of the first and second node as present in the file passed to -outval

  • LinkFreq: the trajectory frequency of this link or 100 if the network was calculated from a pdb file

  • LinkForce: the interaction strength of this link

  • LinkRec: the recurrence of this link in the filtered shortest paths pool calculated as defined in the -mptype option

  • Node1Rec, Node2Rec: the recurrences of the first and second node of this link in the filtered shortest paths pool calculated as defined in the -mptype option

  • IsNode1Hub, IsNode2Hub: whether the first and the second node of this link are hubs

{name}_metapath.pml {name}_metapath.vmd

These 3D scripts represent the coarse communication pathway in the analyzed network using the filtered shortest paths pool.

{name}_mpaths2d.png

This image is a 2D representation of the computed metapath.

{name}_paths_corr_dist.csv {name}_paths_corr_dist.png

This plot and the associated data file reports the distribution of the average correlation between the each node and the first and last nodes in each path present in the shortest path pool.

{name}_paths_corrfract_dist.csv {name}_paths_corrfract_dist.png

This plot and the associated data file reports the distribution of the percentage of nodes with a correlation with the first and/or the last node in each path present in the shortest path pool.

{name}_paths_force_dist.csv {name}_paths_force_dist.png

This plot and the associated data file reports the distribution of the average interaction strength of the links in each path present in the shortest path pool.

{name}_paths_hubs_dist.csv {name}_paths_hubs_dist.png

This plot and the associated data file reports the distribution of the percentage of hubs in each path present in the shortest path pool.

{name}_paths_len_dist.csv {name}_paths_len_dist.png

This plot and the associated data file reports the distribution of the number of nodes in each path present in the shortest path pool.

{name}_pathstable.csv

This file reports a series of information for each path in the shortest path pool.

  • N: a progressive number

  • Path: the path

  • Pair: the first and the last node in this path

  • AvgForce: the average interaction strength of links in this path

  • Length: the number of nodes in this path

  • CorrNodes%: the percentage of correlated nodes

  • MaxCorr: the highest correlation in this path

  • Hubs%: the percentage of hub nodes in this path

  • Noi, NoI%: the number and the percentage of nodes of interest in this path (see -noifile option)

  • Cluster: the cluster of nodes this path belong to

{net_name}.pdb

a .pdb file with the coordinates used to calculate the network with the -calc command. Used by .pml and or .vmd 3D scripts.

Shortest Paths Difference


This analysis is used to perform and compare the results of shortest paths analysis of two networks. Most of the options accepted by this command are the same of those accepted by -path command and their values can be conveniently applied to both networks using the same syntax. Additionally. it is also possible to pass different values of the same option to the two networks being analyzed to explore their effects (i.e. -rec1 and -rec2 instead of -rec). Refer to the -path table for a detailed description of the accepted options. As always, all non used options will be set to reasonable default values.

Table 10. Command options and descriptions
Option Description

-pathsdiff

Must be the first option passed to psntools. Required.

-psn1, -psn2

Used to select the .psn files to be analyzed. Required.

-name

If used, all resulting output files will be named according to the value passed to this option. Default is pathsdiff. Optional.

-labfile
-labfile1, -labfile2

Optional.

-outval
-outval1, -outval2

Optional.

-pdb1, -pdb2

Optional.

-gly, -gly1, -gly2

Optional.

-freq
-freq1, -freq2

Optional.

-cons
-cons1, -cons2

Optional.

-noifile
-noifile1, -noifile2

Optional.

-noi, -noi1, -noi2

Optional.

-mergeclust
-mergeclust1, -mergeclust2

Optional.

-mpselmode
-mpselmode1, -mpselmode2

Optional.

-rec, -rec1, -rec2

Optional.

-corr, -corr1, -corr2

Optional.

-corrmode
-corrmode1, -corrmode2

Optional.

-weighttype
-weighttype1, -weighttype2

Optional.

-pass, -pass1, -pass2

Optional.

-tail, -tail1, -tail2

Optional.

-pair, -pair1, -pair2

Optional.

-pairsfile
-pairsfile1, -pairsfile2

Optional.

-hubs, -hubs1, -hubs2

Optional.

-corrscore
-corrscore1, -corrscore2

Optional.

-writeptable

For more details see the description of this option in the corresponding -paths table.

-tnum

This option set the number of threads used to speedup shortest paths calculation. If you are working on a multi-cores workstation, a large value is advisable. 1 by default. Optional.

-mp2d

if yes a 2D representation of the difference between the two metapaths will be produced. Default is no. Optional.

-out3D

For more details see the description of this option in the corresponding -paths table.

-color

Used to select the coloring criterion of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional

-size

Used to select the criterion behind the size of represented network elements like nodes, links, hubs etc. See the corresponding section for more details about this option. Optional.

Some examples:
psntools -pathsdiff -psn1 3lnx.psn -psn2 3lny.psn

psntools -pathsdiff -psn1 3lnx.psn -psn2 3lny.psn -labfile pdz.lab


This command will produce the same files produced by the -paths for both passed networks. Refer to the corresponding output files table for a detailed descriptions of these files. Additionally, the following files will also be produced:

Table 11. Output files and descriptions
Output Description

{name}_mpdiff_info.csv

This file is a summary of the two calculated metapaths.

  • MinFreq: the value passed to -freq or -freq1 and -freq2 option(s)

  • MinCorr: the value passed to -corr or -corr1 and -corr2 option(s)

  • MinRec: the value passed to -rec or -rec1 and -rec2 option(s)

  • TotPaths: the total number of filtered paths from both networks

  • MPLinks: the total number of links present in the two computed matapaths

  • MPNodes: the total number of nodes present in the two computed matapaths

  • SpecLinks, *SpecLinks%: the number and the % of network-specific metapath links

  • SharedLinks, SharedLinks%: the number and the % of shared metapath links

  • SpecNodes, *SpecNodes%: the number and the % of network-specific metapath nodes

  • SharedNodes, SharedNodes%: the number and the % of shared metapath nodes

{name}_mpdiff_tab.csv

This file has a detailed table of the metapath links of both networks

  • N: a proressive number

  • Node1, *Node2: the first and second node of this link

  • Owner: this column specifies the name of the network in which this hub is present, or Shared if present in both networks

  • Rec1, Rec2: the recurrence of this link in the filtered shortest paths pools calculated as defined in the -mptype or -mptype1/-mptyp2 option(s)

  • Freq1, Freq2: the trajectory frequency of this link in both networks or 100 if the networks were calculated from pdb files

  • Force1, Force2: the interaction strength of this link in both networks

  • OutVal1, OutVal2: the external values of the first and second node of this link as present in the file passed to -outval or -outval1 file

  • OutVal1, OutVal2: the external values of the first and second node of this link as present in the file passed to -outval or -outval2 file

  • IsNode1HubInNet1, IsNode1HubInNet2: whether the first and the second node of this link are hubs in the first network

  • IsNode2HubInNet1, IsNode2HubInNet2: whether the first and the second node of this link are hubs in the second network

{name}_avg_corrfract_hist.png

This histogram shows the average fraction of correlated residues of the filtered shortest paths pool in the two networks.

{name}_avg_corr_hist.png

This histogram shows the average correlation of the filtered shortest paths pool in the two networks.

{name}_avg_hubfract_hist.png

This histogram shows the average correlation of the filtered shortest paths pool in the two networks.

{name}_avg_len_hist.png

This histogram shows the average length of the filtered shortest paths pool in the two networks.

{name}_totpathsdiff_hist.png

This histogram show the difference in the total number of shortest paths in the two networks

{name}_corr_dist.png {name}_corr_dist_pcn.png

These plots show the distributions of the residue average correlations plotted versus the total number of shortest paths and the % of total shortest paths, respectively.

{name}_corrfract_dist.png {name}_corrfract_dist_pcn.png

These plots show the distributions of the fraction of correlated residues plotted versus the total number of shortest paths and the % of total shortest paths, respectively.

{name}_force_dist.png {name}_force_dist_pcn.png

These plots show the distributions of the average interaction strength of the path links plotted versus the total number of shortest paths and the % of total shortest paths, respectively.

{name}_hubs_dist.png {name}_hubs_dist_pcn.png

These plots show the distributions of the hubs % present in each path plotted versus the total number of shortest paths and the % of total shortest paths, respectively.

{name}_len_dist.png {name}_len_dist_pcn.png

These plots show the distributions of the path length plotted versus the total number of shortest paths and the % of total shortest paths, respectively.

{name}_net1_vs_net2_mpdiff.pml {name}_net1_vs_net2_mpdiff.vmd {name}_net2_vs_net1_mpdiff.pml {name}_net2_vs_net1_mpdiff.vmd

These 3D scripts represent the difference between the two metapaths.

{name}_net1_vs_net2_mpdiff2d.png {name}_net2_vs_net1_mpdiff2d.png

These files are a 2D representation of the comparison between the two metapaths.

Shortest Paths Determinants


This analysis is used to highlight the relevance of each link in a given metapath by iteratively removing each link from the network and then recalculating the resulting metapath. The effect of link removal on the formation of the native metapath is then expressed as a percentage of native metapath links missing in the perturbed metapath. Finally, the most relevant links are also iteratively combined and removed to test their synergistic effect.

In addition to the two specific option listed in the table below, this command also accepts all the options accepted by -paths command, please refer to that table for a detailed description of their meanings.

Table 12. Command options and descriptions
Option Description

-mpdet

Must be the first option passed to psntools, followed by a .psn file. Required.

-linkcmb

Set the number of most relevant links to be combined and removed. Default is 5. Optional.


Some examples:
psntools -mpdet 3lnx.psn


This command will produce the same outputs produced by -paths command and the following specific file:

Table 13. Output files and descriptions
Output Description

{name}_mpdet.csv

  • N: a progessive number

  • Link: link

  • IsNode1Hub?, IsNode2Hub?: Yes if the first/second node of this link is an hub, No otherwise

  • Node1OutVal, Node2OutVal: the external value associated to the first/second node of this link in the external values file or 0 if no file was passed or this node is not present in passed file

  • Freq: trajectory frequency of this link

  • Force: average interaction strength of this link

  • Rec: the recurrence of this link in the filtered shortest paths pools calculated as defined in the -mptype option

  • PertScore: the perturbation score of this link, calculated as a % of native metapath links missing in the perturbed metapath

  • NumPertLinks: the number of missing links in the perturbed metapath

  • AffectedLinks…​: a series of columns listing the affected links

Consensus PSN


Computation of consensus networks allows inferring common structural communication features in homologous systems sharing the same functionality or even only the fold. A consensus network is calculated by averaging the trajectory frequencies and interactions strengths of all hubs and links among the networks used to build the consensus. Once produced, the .psn file of a consensus network can be used as a normal network and can be analyzed using the same methods seen so far.


Table 14. Command options and descriptions
Option Description

-cons

Must be the first option passed to psntools. Required.

-name

The value passed to this option will be used as consensus name. Default is consensus. Optional.

-psn

This option can be used multiple times to list the .psn files used to calculate the consensus network. An optional labels file, to be applied to the passed network, can also be specified appending a comma character and a labels file (see examples below). At least two .psn files must selected. Required.

-labfile

This option can be used to pass the same labels file to all networks being analyzed. Optional.


Some examples:
psntools -cons -psn 3lnx.psn -psn 3lny.psn

psntools -cons -psn 3lnx.psn -psn 3lny.psn,pdz.lab

psntools -cons -psn 3lnx.psn,pdz.lab -psn 3lny.psn,pdz.lab

psntools -cons -psn 3lnx.psn -psn 3lny.psn -labfile pdz.lab


As for the -calc, this command will produce only a binary .psn file called after the .pdb file name or the value passed to -name option.

Plots Of Trajectory Network Statistics


This command is used to generate two different types of plots of network statistics obtained during the network calculation stage ( see -calc command for more details). Despite being available also for networks generated from a single .pdb file, the primary target of this command are networks generated from a trajectory file.


Table 15. Command options and descriptions
Option Description

-tsplots

Must be the first option passed to psntools, followed by a .psn file. Required.

-dist

This option accepts the name of the network statistic to plot. Optional.

-surf

This option accepts a pair of names of the network statistics to plot, separated by a "," character. Optional.


At least one -dist or -surf option must be passed. Multiple -dist and/or -surf options are also accepted. You can get the list of available network statistics for the submitted .psn file using -psninfo command.

Some examples:
psntools -tsplots 3lnx.psn -dist Hubs

psntools -tsplots 3lnx.psn -dist Hubs -dist Links

psntools -tsplots 3lnx.psn -surf Hubs,Links


For each -dist option passed to this command the following output files will be produced:

Table 16. Output files and descriptions
Output Description

{name}_tsplot_{stat}_dist.png

A plot of the distribution of the statistic values versus the % of trajectory frames.

{name}_tsplot_{stat}_dist.csv

A .csv file with the values used to generated the plot mentioned above.

{name}_tsplot_{stat}_dist_info.csv

This is a table with some information about the distribution of the network statistic:

  • MinValue: the smallest value

  • AvgValue: the average value

  • StDevValue: the standard deviation

  • MaxValue: the largest value

  • MostRecValue: the most recurrent value

  • MostRecFreq: the % of frames with the most recurrent value

  • 2ndMostRecValue: the 2nd most recurrent value

  • 2ndMostRecFreq: the % of frames with the 2nd most recurrent value

  • 3rdMostRecValue: the 3rd most recurrent value

  • 3rdMostRecFreq: the % of frames with the 3rd most recurrent value

Where {name} and {stat} are the network name of passed .psn file and network statistics name passed to -dist option, respectively.

For each -surf option passed to this command the following output files will be produced:

Table 17. Output files and descriptions
Output Description

{name}_tsplot_{stat1}_vs_{stat2}_surfmat.png

A 2D plot of the joint probability distribution the statistics values versus the % of trajectory frames.

{name}_tsplot_{stat1}_vs_{stat2}_surfmat.dat

A text file with the data used to generated the plot mentioned above.

{name}_tsplot_{stat1}_vs_{stat2}_surfmat_info.dat

This is a table with some information about three most recurrent pairs of values of the two passed statistics:

  • Basin: a progressive number

  • MinX, MaxX: the smallest and the largest value of the first statistic passed to the -surf option

  • MinY, MaxY: the smallest and the largest value of the second statistic passed to the -surf option

  • Pop: the % of trajectory frames with the above ranges of the first and the second statistic passed to the -surf option

Where {name}, {stat1}, {stat2} are the network name of passed .psn file and the names of network statistics passed to the_-surf_ option, respectively.

Plots Of Difference In Trajectory Network Statistics


This command is similar to -tsplots and is used to calculate the difference in the trajectory network statistics of two different .psn files.


Table 18. Command options and descriptions
Option Description

-tsdiffplots

Must be the first option passed to psntools. Required.

-psn1 -psn2

These options are used to pass the two .psn files. Required.

-dist

This option accepts a pair of names of the network statistics to plot, separated by a "," character. Optional.

-surf

This option accepts a pair of names of the network statistics to plot, separated by a "," character. Optional.


At least one -dist or -surf option must be passed. Multiple -dist and/or -surf options are also accepted. You can get the list of available network statistics for each submitted .psn file using -psninfo command.

Some examples:
psntools -tsdiffplots -psn1 3lnx.psn -psn2 3lny.psn -dist Hubs

psntools -tsdiffplots -psn1 3lnx.psn -psn2 3lny.psn -dist Hubs -dist Links

psntools -tsdiffplots -psn1 3lnx.psn -psn2 3lny.psn -surf Hubs,Links


For each -dist option passed to this command the following output files will be produced:

Table 19. Output files and descriptions
Output Description

tsdiffplot_{basename}_dist.png

A plot of the distribution of the statistic values versus the % of trajectory frames of both networks.

tsdiffplot_{basename}_dist.csv

A .csv file with the values used to generated the plot mentioned above.

tsdiffplot_{basename}_dist_info.csv

This is a table with some information about the distribution of the network statistic:

  • MinValue: the smallest value

  • AvgValue: the average value

  • StDevValue: the standard deviation

  • MaxValue: the largest value

  • MostRecValue: the most recurrent value

  • MostRecFreq: the % of frames with the most recurrent value

  • 2ndMostRecValue: the 2nd most recurrent value

  • 2ndMostRecFreq: the % of frames with the 2nd most recurrent value

  • 3rdMostRecValue: the 3rd most recurrent value

  • 3rdMostRecFreq: the % of frames with the 3rd most recurrent value

Where {basename} is composed by the following substring: {name1}_vs_{name2}_{stat}

where {name1}, {name2} and {stat} are the network names of passed .psn files and network statistics name passed to -dist option, respectively.

For each -surf option passed to this command the following output files will be produced:

Table 20. Output files and descriptions
Output Description

tsdiffplot_{basename}_surfmat.png

A pair of 2D plots of the joint probability distribution the statistics values versus the % of trajectory frames in the two passed networks.

tsdiffplot_{basename}_net1_surfmat.dat tsdiffplot_{basename}_net2_surfmat.dat

A pair of text files with the data used to generated the plot mentioned above and referring to the first and second .psn file, respectively.

tsdiffplot_{basename}_net1_surfmat_info.dat tsdiffplot_{basename}_net2_surfmat_info.dat

These files contain some information about three most recurrent pairs of values of the two passed statistics in both networks.

  • Basin: a progressive number

  • MinX, MaxX: the smallest and the largest value of the first statistic passed to the -surf option

  • MinY, MaxY: the smallest and the largest value of the second statistic passed to the -surf option

  • Pop: the % of trajectory frames with the above ranges of the first and the second statistic passed to the -surf option

tsdiffplot_{basename}_diff_surfmat.png

A 2D plot of the difference of the two joint probability distributions the statistics values versus the % of trajectory frames.

Where {basename} is composed by the following substring: {name1}_vs_{name2}_{stat1}_vs_{stat2}

Where {name1}, {name2}, {stat1}, {stat2} are the network names of passed .psn files and the names of network statistics passed to the_-surf_ option, respectively.

Labels Generation


Labels files are fundamental when calculating network differences or consensus networks with systems with different primary sequences, with one or more point mutations or simply if their pdb files have different chain and or segments. Despite being a simple two columns file format, compiling a labels file by hand can be a time consuming and error prone task, specially if you have large systems and multiple networks to label.

There are three different ways to automatically generate labels files:

-genlabs


This command accepts a 5 columns labels definition file formatted as follow:

Table 21. labels definition file format
Column Description

1st

a pdb file name present in the same working directory or with a full disk path

2nd

the first residue identifier (see Labels files section)

3rd

the last residue identifier

4th

the label of the residues defined by the previous two columns (first → last)

5th

the starting label number


This is an example labels definition that can be used to label the chain A residues of the 3LNX pdb:

3lnx.pdb   A:A:P1     A:A:S94    Res     1
3lnx.pdb   A:A:?97    A:A:?102   IOD     1
3lnx.pdb   A:A:?103   A:A:?103   SCN     1
3lnx.pdb   A:A:?104   A:A:?574   Water   1


If you save the previous four lines in a text file called 3lnx.labdef you can then generate the corresponding labels file running the following command:

psntools -genlabs 3lnx.labdef


-msa2labs


This command provides an alternative way to produce multiple labels files at the same time from a multi sequence alignment (MSA) in fasta format.

Table 22. Command options and descriptions
Column Description

-msa2labs

Must be the first option passed to psntools. Required.

-msa

used to select a msa file in fasta format. Required.

-seq2pdb

This option can be used multiple times to associate a sequence identifier (i.e. string next to > character present before each sequence) to a pdb file.


WebPSN Labels Generator


The easiest way to generate a labels file is to use the corresponding tool provided by WebPSN, the web server version of psntools.

Miscellaneous commands

These are a series of small but helpful additional commands.

-psninfo

This command prints several information listed in the table below about passed .psn file.

psntools -psninfo 3lnx.psn


Table 23. Command options and descriptions
Column Description

Date

Creation date and time of passed .psn file.

Name

network name.

Type

CNS for consensus network, PSN for all other networks.

Mol

Name of the .pdb file name.

Trj

Name of the .dcd/.xtc file name.

NFr

Number of trajectory frames or 1 if the network was generated from a single .pdb file.

Sele

The selection passed to -sele option during network calculation.

Lab

The name of labels file passed to -labfile option during network calculation.

Imin

Network Imin value (see the relevant theory section for more details).

Nodes

The total number of nodes.

Nets

If passed .psn file is a consensus network, the total number of networks used to is reported, 1 otherwise.

PDBs

The list of .pdb files.

Corrs

The total number of node correlations of the atomic fluctuations.

EmbeddedPDBs

The total number and the list of embedded coordinates. Please note that listed names may not correspond to those of the original .pdb files.

TrjStats Sampling

The value passed to the -trjstatsampling option during network calculation.

TrjStats

The total number and the list of saved trajectory statistics. During network calculation the following 24 network descriptors are also computed for each trajectory frame:

  • Links: the total number of links

  • Hubs: the total number of hubs

  • HLinks: the total number of links mediated by at least one hub

  • CommsHLinks: the total number of links mediated by at least one hub that belong to a community

  • CommsHubs: the total number of hubs that belong to a community

  • CommsLinks: the total number of links mediated by a pair of nodes in a community

  • CommsNodes: the total number of nodes in a community

  • CommsNum: the total number of communities of nodes

  • Comm1Nodes, Comm1Links, Comm1Hubs, Comm1HLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the 1st most populus community.

  • Comm2Nodes, Comm2Links, Comm2Hubs, Comm2HLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the 2nd most populus community.

  • Comm3Nodes, Comm3Links, Comm3Hubs, Comm3HLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the 3rd most populus community.

  • Comm123Nodes, Comm123Links, Comm123Hubs, Comm123HLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the 3 most populus communities.

Additionally, for each non standard aminoacid/nucleotide present in passed .pdb /trajectory file, the following 10 network descriptors are also computed:

  • Links, HLinks: the total number of links and links mediated by at least one hubs realised by the given non standard aminoacid/nucleotide

  • CommNodes, CommLinks, CommHubs, CommHLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the community this non standard aminoacid/nucleotide belong to

  • ShellNodes, ShellLinks, ShellHubs, ShellHLinks: the total number of nodes, links, hubs and links mediated by at least one hub in the local sub-network of this non standard aminoacid/nucleotide


-checkupdate

This command just checks if a new version of psntools is available. Needs a working internet connection.

psntools -checkupdate


-help

Prints a simplified help about passed command.

psntools -help calc


-version

Prints the current version and exits.

psntools -version


-getguide

Writes a copy of this user guide and exits.

psntools -getguide


-chname

Modifies the internal name of a passed .psn file.

psntools -chname 3lnx.psn pdx


Appendix A: A Brief Introduction to Protein Structure Network Theory


PSN Calculation

PSN analysis is a product of graph theory applied to protein and nucleic acid structures. A graph is defined by a set of vertices (nodes) and connections (edges) between them. In a PSN, each amino acid residue is represented as a node and these nodes are connected by edges based on the strength of non-covalent interactions between residues. The strength of interaction between residues i and j (Iij) is evaluated as a percentage given by the following equation:

Eq.1

where nij is the number of atom-atom pairs between the side chains of residues i and j within a distance cutoff of 4.5 Å. Ni and Nj are normalization factors for residue types i and j, which account for the differences in size of the amino acid side chains and their propensity to make the maximum number of contacts with other amino acids in protein structures. Glycines, are now included in the PSN analysis. The PSNTools has an internal database with the normalization factors for the 20 standard amino acids and the 8 standard nucleotides (i.e. dA, dG, dC, dT, A, G, C, and U), as well as for more than 30,000 biologically relevant molecules and ions (ligands, lipids, sugars, etc) from the PDB. Additionally, the server automatically identifies un-parametrized molecules in passed PDB files and automatically calculates their normalization factors transparently.

Iij are calculated for all node pairs. At a given interaction strength cutoff, Imin, any residue pair ij for which IijImin is considered to be interacting and hence is connected. Node interconnectivity is used to highlight node clusters, where a cluster is a set of connected nodes in a graph. Cluster size, i.e., the number of nodes constituting a cluster, varies as a function of the Imin, and the size of the largest cluster is used to calculate the Icritic value. The latter is defined as the Imin, at which the size of the largest cluster is half the size of the largest cluster at Imin = 0.0%. Studies by Vishveshwara’s [16] group found that optimal Imin corresponds to the one at which the largest cluster undergoes a transition. All resulting clusters can then be iteratively connected by the link(s) with the highest sub-Icritic interaction strength to compensate, at least in part, for the lack of side chain fluctuations.

Residues making four or more edges are referred to as hubs at that particular Imin. Such cutoff for hub definition relates to the intrinsic limit in the possible number of non covalent connections made by an amino acid in protein structures due to steric constraints. The cutoff 4 is close to the upper limit. The majority of amino acid hubs indeed make from 4 to 6 links, with 4 being the most frequent value. Finally, links are then used to highlight network communities, which are sets of highly interconnected nodes such that nodes belonging to the same community are densely linked to each other and poorly connected to nodes outside the community. Communities can be considered as fairly independent compartments of a graph. They are identified using a variant of the clique percolation method, by finding all the k=3-cliques, i.e. sets of three fully interconnected nodes, and then merging all those cliques sharing at least one node.

Shortest Paths

The search for all shortest paths relies on Dijkstra’s algorithm. The method first finds all possible communication paths between all node pairs and then filters the results according to cross-correlation of atomic motions, as derived from ENM-NMA or LMI analysis for networks calculated from a single .pdb file or from a molecular dynamics simulation, respectively.

Filtering consists in retaining only those shortest paths that contain only residues with a correlation ≥ of a given cutoff with at least one of the two path extremities (i.e. the first and last amino acids in the path). The default values for these cutoffs are 0.7 and 0.8 for networks calculated from a single .pdb file or from a molecular dynamics simulation, respectively and are based on benchmarks tested against experimental data.

Finally, filtered paths were used to build the global meta path, which is made of the most recurrent links, i.e. those links present in a number paths ≥ 10% or 20% for networks calculated from a single .pdb file or from a molecular dynamics simulation, respectively of the number of paths in which the most recurrent link in present. Such meta path represents a coarse/global picture of the structural communication in the considered system. The user can also filter those paths that begin and end at a given residue pair or that pass through a residue. Such a path filtering provides a novel metapath and is particularly recommended when some information on residues involved in structural communication is available.

Appendix B: Correlations of the Atomic Fluctuations

The calculation of the atomic fluctuations is a foundamental step in shortest paths analysis and is automatically performed by PSNTools by means of the latest realease of our Wordom software [18]. Two different kind of

Elastic Network Model Correlations


The combination between a coarse grained representation of a protein structure (e.g. ENM) and Normal Mode Analysis (NMA) is ever increasingly used to study the collective dynamics of complex systems. ENM-NMA is a coarse grained normal mode analysis technique able to describe the vibrational dynamics of protein systems around an energy minimum. With this technique, each protein/nucleic acid structure is described by a reduced subset of atoms corresponding to the Cα atoms, for standard amino acids, and the atom nearest to the geometric center for all other molecules.

The interactions between particle pairs are given by a single term Hookean harmonic potential. The total energy of the system is thus described by the simple Hamiltonian:

Eq.2

where dij and dij0 are the instantaneous and equilibrium distances between particle i and j, respectively, whereas kij is a force constant, defined as:

Eq.3

where C is constant (with a default value of 40 Kcal/mol ·Å2).

The cross-correlations of motions for path filtering are obtained from the covariance matrix C [17]:

Eq.4

where Cij denotes the correlation between particles i and j, M is the number of modes considered for computation (the first 10 non-zero frequency modes), vxy and λy are, respectively, the xth element and the associated eigenvalue of the yth mode.

Linear Mutual Information Correlations


Eq.4

where i and j ar residues, Cij is the pair-covariance matrix, and Ci and Cj are marginal covariance matrices. LMI correlation values can vary from 0.0 to 1.0, which indicate completely uncorrelated and completely correlated displacements, respectively.

Appendix C: Labels File


When calculating the difference between two networks or a consensus among a pool of networks it is of fundamental importance to unambiguously identify structurally equivalent residues/nucleotides among processed proteins/nucleic acids. A unique identifier, called label, is then associated to these equivalent residues/nucleotides so that their interactions can be compared among the analyzed networks.

There are four possible methods to generate a labels file:

Additionally, you can compile a labels file by hand. Although the compilation of label files may require a considerable amount of time depending on the size and number of analyzed structures, label files provide the user with a full control of the labeling process.

The format of a labels file is quite similar to the one used in external values files: a text file with 2 columns per line, one-residue definition and one label separated by at least one space or tab character, as in the following excerpt:

...
C:S:Y10   Tyr10
C:S:V11   Val11
C:S:P12   Pro12
...

Residues are indicated using the following syntax:

Chain:Segment:ResTypeResNum

Please use one-letter codes for standard amino acids (e.g. P for proline, Y for tyrosine, etc) and the following lower case one-letter codes for standard nucleotides:

Base Code

A, DA

a

C, DC

c

G, DG

g

DT

t

U

u

Use ? character for any other molecule present in your pdb.

...
C:S:?100   Ligand
C:S:?200   Water
C:S:?300   Mg
...

A label can be a combination of any length of upper and lower case letters (A-Z, a-z), digits (0-9) and all other printable symbols (e.g. !, @, % etc) with the only two exceptions of # and - characters.

Appendix D: External Values File


The user can, optionally, provide numerical values to be associated with any number of residues (e.g. conservation scores, mutation effect, etc.). If provided, these values will appear in the output files (columns OutVal/Value). These values are reported for your convenience only and are not used in any way.

The format of an external values file is quite similar to the one used in labels file: a text file with 2 columns per line, one-residue definition and one label separated by at least one space or tab character, as in the following excerpt:

...
C:S:Y10   7
C:S:V11   7
C:S:P12   5
...

Residues are indicated using the following syntax:

Chain:Segment:ResTypeResNum

Please use one-letter codes for standard amino acids (e.g. P for proline, Y for tyrosine, etc) and the following lower case one-letter codes for standard nucleotides:

Base Code

A, DA

a

C, DC

c

G, DG

g

DT

t

U

u

Use ? character for any other molecule present in your pdb.

...
C:S:?100   5
C:S:?200   0
C:S:?300   1
...

Appendix E: Selection Syntax


PSNTools adopts a modified version of the selection syntax used in Wordom.

This syntax employs a string structured as follows:

/chain/segment/residues
segment is the 12th field in the pdb (3rd after coordinates). It is a 4-character field, which must not be confused with the chain (single-character) field after the residue-type in the pdb (5th field).

Wild cards such as * (any number of any character), ? (any single character), [abc] (any single character among a, b and c) and [!abc] (any single character except a, b and c) are supported. Ranges can also be defined using - character.

Some selection string examples and their meanings:
Syntax       Meaning
/*/*/*       all residues in all chains and segments (default selection)
/C/S/*       all residues in chain C and segment S
/C/S/135     only residue 135 from chain C and segment S
/C/S?/*/*    all residues in chain C and segment S1, SB, SC ...
/C/S[AB]/*   all residues in chain C and segment SA, SB
/C/S/1-32    all residues in the range 1-32 from chain C and segment S


Ranges can be concatenated using the | character

/C/S/1-10|15|20-30

selects residues from 1 to 10, 15 and from 20 to 30 from chain C and segment S.


Finally, several selections can be concatenated using the ; character

/A/Q/* ; /B/W/1-10|15 ; /G/E/20-30

selects all residues from chain A and segment Q, residues from 1 to 10 and 15 from chain B and segment W and residues from 20 to 30 from chain G and segment E

Appendix F: -color Options


Used to select the coloring criterion of represented network elements like nodes, links, hubs etc. Valid values are:

  • auto
    the color is automatically selected as follow:

    • if the network was calculated from a single pdb file, -color is set to force (i.e. interaction strength) for the following files:

      • {net_name}_links.pml and {net_name}_links.vmd

      • {net_name}_hlinks.pml and {net_name}_hlinks.vmd

      • {net_name}_hubs.pml and {net_name}_hubs.vmd

    • if the network was calculated from a trajectory file, -color is set to freq (i.e. trajectory frequency of each link) for the following files:

      • {net_name}_links.pml and {net_name}_links.vmd

      • {net_name}_hlinks.pml and {net_name}_hlinks.vmd

    • if the network was calculated from a trajectory file, -color is set to hfreq (i.e. trajectory frequency of each hub) for the following files:

      • {net_name}_hubs.pml and {net_name}_hubs.vmd

    • -color is set to comm for the following files:

      • {net_name}_comms.pml and {net_name}_comms.vmd

  • cls
    network elements will be colored according to the node cluster they belong to

  • comm
    network elements will be colored according to the node community they belong to

  • cons
    network elements will be colored according to their consensus conservation

  • force
    links will be colored according to their interaction strength and nodes and hubs according to the average interaction strength of their links.

  • freq
    links and hubs will be colored according to their trajectory frequency while nodes to the average trajectory frequency of their links.

  • hforce
    hubs are colored according to the average force of their links.

  • hfreq
    hubs are colored according to their trajectory frequency.

  • hpert
    hubs are colored according to their perturbation.

  • lpert
    links are colored by their perturbation, while nodes to the average perturbation of their links.

  • outval
    nodes will be colored according to the numerical values file passed with -outval option, while each link will be colored according to the average external values of its two nodes.

  • rec
    nodes and links will be colored according to their recurrences in the shorted paths pool. <<<

Appendix G: -size Options


Used to select the criterion behind the size of represented network elements like nodes, links, hubs etc. Valid values are:

  • auto
    the size is automatically selected as follow:

    • if the network was calculated from a single pdb file, -size is set to force (i.e. interaction strength) for the following files:

      • {net_name}_links.pml and {net_name}_links.vmd

      • {net_name}_hlinks.pml and {net_name}_hlinks.vmd

      • {net_name}_hubs.pml and {net_name}_hubs.vmd

    • if the network was calculated from a trajectory file, -size is set to freq (i.e. trajectory frequency of each link) for the following files:

      • {net_name}_links.pml and {net_name}_links.vmd

      • {net_name}_hlinks.pml and {net_name}_hlinks.vmd

    • if the network was calculated from a trajectory file, -size is set to hfreq (i.e. trajectory frequency of each hub) for the following files:

      • {net_name}_hubs.pml and {net_name}_hubs.vmd

    • -color is set to comm for the following files:

      • {net_name}_comms.pml and {net_name}_comms.vmd

  • cons
    the size of network elements will be selected according to their consensus conservation

  • fix all network elements will have a fixed, predefined, size.

  • force
    the size of each link will be proportional to its interaction strength, while the size of nodes and hubs will be proportional to the average interaction strength of their links.

  • freq
    the size of each link will be proportional to its trajectory frequency, while the size of nodes and hubs will be proportional to the average trajectory frequency of their links.

  • hforce
    the size of each hub will be proportional to the average interaction strength of its links.

  • hfreq
    the size of each hub will be proportional to its trajectory frequency.

  • hpert
    the size of each hub will be proportional to its perturbation.

  • lpert
    the size of each link will be proportional to its perturbation, while the size of nodes and hubs will be proportional to the average perturbation of their links.

  • outval
    the size of each node will be proportional to the numerical values file passed with -outval option, while the size of each link will be proportional to the average external values of its two nodes.

  • rec
    the size of nodes and links will be proportional to their recurrences in the shorted paths pool.

Appendix H: Color Legends


These are the colors associated to the first nine most populous communities and clusters that will be used when -color option is set to comm or cls, respectively:

Table 24. Communities and Clusters Color Table
Community/Cluster 1st 2nd 3rd 4th 5th 6th 7th 8th 9th

Color

red block

green block

blue block

yellow block

cyan block

magenta block

lime block

pink block

orange block

The following colors are used to represent interaction strength, trajectory frequency and metapath recurrence when -color option is set to force, freq, hfreq, rec, respectively:

Table 25. Interaction Strength, Frequency and Recurrence Color Table:
Color I.S. Frequency Recurrence

clr 1

0 < i.s. ≤ 1

0 < freq ≤ 10

0 < rec ≤ 10

clr 2

1 < i.s. ≤ 2

10 < freq ≤ 20

10 < rec ≤ 20

clr 3

2 < i.s. ≤ 3

20 < freq ≤ 30

20 < rec ≤ 30

clr 4

3 < i.s. ≤ 4

30 < freq ≤ 40

30 < rec ≤ 40

clr 5

4 < i.s. ≤ 5

40 < freq ≤ 50

40 < rec ≤ 50

clr 6

5 < i.s. ≤ 6

50 < freq ≤ 60

50 < rec ≤ 60

clr 7

6 < i.s. ≤ 7

60 < freq ≤ 70

60 < rec ≤ 70

clr 8

7 < i.s. ≤ 8

70 < freq ≤ 80

70 < rec ≤ 80

clr 9

8 < i.s. ≤ 9

80 < freq ≤ 90

80 < rec ≤ 90

clr 10

9 < i.s.

90 < freq ≤ 100

90 < rec ≤ 100

These three colors are used by -psgdiff, -patsdiff and -tsdiffplots commands:

Table 26. Communities and Clusters Color Table
Present in 1st network both networks 2st network

Color

clr net1

clr both

clr net2