psnGPCRdb is a curated and up-to-date database for high throughput investigation of allosterism in experimentally resolved structures of G protein coupled receptors (GPCRs). GPCRs represent the largest superfamily in the human proteome and the targets of an estimated 30–40% of all drugs currently on the market. They share seven transmembrane helices organized in an up-down bundle architecture and can be classified into five major families (classes) and further divided into subfamilies based on sequence similarities.
psnGPCRdb stores the structure networks (i.e. linked nodes, hubs, communities and communication pathways) computed on all updated GPCR structures in the Protein Data Bank (PDB), in their isolated states or in complex with extracellular and/or intracellular molecules.
The structure networks were computed by the PSNtools software [1] that relies on a mixed Protein Structure Network (PSN) and Elastic Network Model-Normal Mode Analysis (ENM-NMA) [1-7, 10]. The approach proved valuable to investigate structure-function relationships in a number of studies [1-10].
psnGPCRdb provides the user with a friendly interface and an immediate feedback through graphical visualizations of the output.
The database offers the following features:
The psnGPCRdb database employs a mixed Protein Structure Network (PSN) and Elastic Network Model-Normal Mode Analysis (ENM-NMA)-based strategy to investigate allosterism in GPCR[1,2]. PSN is used to compute the interaction strengths and connectivities among nodes while ENM-NMA provides information on system’s dynamics, which serves to compute the cross-correlation of atomic motions for path filtering. The method is hereafter indicated as PSN-ENM.
In synthesis, the first step in the PSN-ENM approach consists in performing the PSN analysis on a single high resolution structure, which serves to derive the network components (e.g. nodes, links, hubs, communities). Nodes interconnectivities represent also the basis to search for all possible shortest communication paths between all nodes in the network. The shortest paths are then filtered out according to the cross-correlation of atomic motions derived from ENM-NMA resulting in a reduced pool of paths composed of highly correlated nodes.
Finally, a global metapath made of the most recurrent links in the shortest path pool is computed to infer a coarse picture of the structural communication in the considered system. Single paths can be inferred as well by providing a single node pair as an input of path search. Important novelties concern the possibility to:
Computation of consensus network allows to infer common structural communication features in proteins sharing the same functionality[1, 2, 5, 7, 9] or even sharing only the fold. On the other hand, computation of difference network is particularly useful to infer commonalties and differences in the structural communication of two functionally different states of the same protein[1, 2, 6, 7].
The psnGPCRdb database offers high quality and publication-ready plots and 3D outputs, as PyMol and VMD scripts, as well as a number of easy-to-access data files. Additionally, all these outputs are conveniently zipped for download.
PSN analysis is a product of graph theory applied to protein and nucleic acid structures [4, 16]. A graph is defined by a set of vertices (nodes) and connections (edges) between them. In a PSN, each amino acid residue is represented as a node and these nodes are connected by edges based on the strength of non-covalent interactions between residues. The strength of interaction between residues i and j (Iij) is evaluated as a percentage given by equation 1:
(1) |
where nij is the number of atom-atom pairs between the side chains of residues i and j within a distance cutoff of 4.5 Aring. Ni and Nj are normalization factors for residue types i and j, which account for the differences in size of the amino acid side chains and their propensity to make the maximum number of contacts with other amino acids in protein structures. Glycines, are now included in the PSN analysis. The webPSN server has an internal database with the normalization factors for the 20 standard amino acids and the 8 standard nucleotides (i.e. dA, dG, dC, dT, A, G, C, and U), as well as for more than 30,000 biologically relevant molecules and ions (ligands, lipids, sugars, etc) from the PDB. Additionally, the server automatically identifies un-parametrized molecules in the submitted PDB files and automatically calculates their normalization factors transparently.
Iij are calculated for all node pairs. At a given interaction strength cutoff, Imin, any residue pair ij for which Iij ≥ Imin is considered to be interacting and hence is connected. Node interconnectivity is used to highlight node clusters, where a cluster is a set of connected nodes in a graph. Cluster size, i.e., the number of nodes constituting a cluster, varies as a function of the Imin, and the size of the largest cluster is used to calculate the Icritic value. The latter is defined as the Imin, at which the size of the largest cluster is half the size of the largest cluster at Imin = 0.0%. Studies by Vishveshwara's [16] group found that optimal Imin corresponds to the one at which the largest cluster undergoes a transition. All resulting clusters are then iteratively connected by the link(s) with the highest sub-Icritic interaction strength to compensate, at least in part, for the lack of side chain fluctuations.
Residues making four or more edges are referred to as hubs at that particular Imin. Such cutoff for hub definition relates to the intrinsic limit in the possible number of non covalent connections made by an amino acid in protein structures due to steric constraints. The cutoff 4 is close to the upper limit. The majority of amino acid hubs indeed make from 4 to 6 links, with 4 being the most frequent value.
Finally, links are then used to highlight network communities, which are sets of highly interconnected nodes such that nodes belonging to the same community are densely linked to each other and poorly connected to nodes outside the community. Communities can be considered as fairly independent compartments of a graph. They are identified using a variant of the clique percolation method, by finding all the k=3-cliques, i.e. sets of three fully interconnected nodes, and then merging all those cliques sharing at least one node.
The combination between a coarse grained representation of a protein structure (e.g. ENM) and Normal Mode Analysis (NMA) is ever increasingly used to study the collective dynamics of complex systems. ENM-NMA is a coarse grained normal mode analysis technique able to describe the vibrational dynamics of protein systems around an energy minimum. With this technique, each protein/nucleic acid structure is described by a reduced subset of atoms corresponding to the Cα-atoms, for standard amino acids, and the atom nearest to the geometric center for all other molecules.
The interactions between particle pairs are given by a single term Hookean harmonic potential. The total energy of the system is thus described by the simple Hamiltonian:
(2) |
where dij and dij0 are the instantaneous and equilibrium distances between particle i and j, respectively, whereas kij is a force constant, defined as:
(3) |
where C is constant (with a default value of 40 Kcal/mol ·Å2).
The cross-correlations of motions for path filtering are obtained from the covariance matrix C [17]:
(4) |
where Cij denotes the correlation between particles i and j, M is the number of modes considered for computation (the first 10 non-zero frequency modes), νxy and λy are, respectively, the xth element and the associated eigenvalue of the yth mode.
All ENM-NMA calculations are performed by means of the latest realease of our Wordom software [18]
The search for all shortest paths relies on Dijkstra’s algorithm [19]. The method first finds all possible communication paths between all node pairs and then filters the results according to cross-correlation of atomic motions, as derived from ENM-NMA analysis.
Filtering consists in retaining only those shortest paths that contain only residues with a correlation ≥ 0.7 with at least one of the two path extremities (i.e. the first and last amino acids in the path).
Finally, filtered paths were used to build the global meta path, which is made of the most recurrent links, i.e. those links present in a number paths ≥ 10% of the number of paths in which the most recurrent link in present.
Such meta path represents a coarse/global picture of the structural communication in the considered system.
When calculating the difference between two networks or a consensus among a pool of networks it is of fundamental importance to unambiguously identify structurally equivalent residues among processed receptors and associated proteins. A unique identifier, called label, is then associated to these equivalent residues and ligands to correctly compare residue/ligand interactions among the analyzed networks. Labels used in psnGPCRdb are based on the Generic GPCR Residue Numbers scheme of GPCRdb.[24]
Conservation data reported in output tables, are obtained using the ConSurf web server.[25] Amino acid conservation values are expressed as nine conservation grades, from 1 to 9, where 1 includes the most rapidly evolving positions, 5 includes positions of intermediate rates, and 9 includes the most evolutionary conserved positions. We employed the WT sequences of each GPCR to run the ConSurf analysis.
Net Summary | |
---|---|
Imin | The minimum interaction strength needed to connect two nodes. More details about this value and how it is calculated can be found in the PSN section of the theory page. |
Number of Linked Nodes | Total number of nodes with at least one link. |
Number of Links | Total number of links with an interaction strength ≥ Imin. Links with a lower value may have been added to avoid excessive network fragmentation. More details about links with a sub-Imin interaction strength can be found in the PSN section of the theory page. |
Number of Hubs | Total number of nodes with at least 4 links. More details about this cutoff can be found in the PSN section of the theory page. |
Number of Links mediated by hubs | Total number of links mediated by hubs. |
Number of Communities | Total number of communities. Communities are sets of highly interconnected nodes that can be viewed as fairly independent compartments of a graph. You can find a more detailed explanation of what a community is and how it is identified in the PSN section of the theory page. |
Number of Nodes involved in Communities | Total number of nodes in a community. |
Number of Links involved in Communities | Total number of links in a community. |
Network Similarities | |
---|---|
Average % Shared Neighbours (Jaccard) |
Is the average of the ratio of the intersection over union of each node links.
Where n is a given node in network A and B, and An and Bn are the links of node n in network A and B, respectively [20]. |
Average % Shared Neighbours (Otsuka version of cosine similarity) |
Is the average of the ratio of the intersection over the square root of the product of the number of links made by each node in compared networks..
Where n is a given node in network A and B, and An and Bn are the links of node n in network A and B, respectively [21]. |
Average % Shared Neighbours (Overlap Coefficient) |
Is the average of the ratio of the intersection over the smaller list of links made by each node. Also known as overlap coefficient.
Where n is a given node in network A and B, and An and Bn are the links of node n in network A and B, respectively [22]. |
Average % Shared Cliques (k3-6) |
Is the average of the ratio of the intersection over union of k=3, k=4, k=5, and k=6 cliques.
Where Ak and Bk are k-cliques in in network A and B, respectively. |
Graphlets Similarity | Is the Graphlet Degree Distribution Agreement calculated comparing in the two network the distribution of Graphlets small, connected, non-isomorphic subgraphs [23]. |
Path Summary | |
---|---|
Number Of Nodes in Metapath | Total number of nodes in the global/filtered metapath. |
Number Of Links Metapath | Total number of links in the global/filtered metapath. |
Number of Shortest Paths | Total number paths in the global/filtered paths pool. |
Length Of Smallest Path | Number of nodes in the shortest path. |
Average Path Length | Average number of nodes in the global/filtered paths pool. |
Length of Longest Path | Number of nodes in the longest path. |
Minimum Path Strength | Lowest average interaction strength of links in the global/filtered path pool. |
Average Path Strength | Average of the average interaction strengths of links in the global/filtered path pool. |
Maximum Path Strength | Highest average interaction strength of links in the global/filtered path pool. |
Minimum Path Correlation | Lowest average motion correlation between each node and the two extreme nodes in a path in the global/filtered path pool. |
Average Path Correlation | Average of the average motion correlations between each node and the two extreme nodes in a path in the global/filtered path pool. |
Maximum Path Correlation | Highest average motion correlation between each node and the two extreme nodes in a path in the global/filtered path pool. |
Minimum % Of Corr. Nodes | Lowest percentage of internal nodes with a motion correlation ≥ the cutoff with one or both the two extremities in a path in the global/filtered path pool. |
Average % Of Corr. Nodes | Average percentage of internal nodes with a motion correlation ≥ the cutoff with one or both the two extremities in a path in the global/filtered path pool. |
Maximum % Of Corr. Nodes | Highest percentage of internal nodes with a motion correlation ≥ the cutoff with one or both the two extremities in a path in the global/filtered path pool. |
Minimum Path Hubs % | Lowest percentage of hubs in the global/filtered path pool. |
Average Path Hubs % | Average percentage of hub nodes present in the global/filtered paths pool. |
Maximum Path Hubs % | Highest percentage of hubs in the global/filtered path pool. |
Thank you for using psnGPCR, we really appreciate it. Please, remember to cite the following paper in all published works which utilize this webserver: Angelo Felline, Sara Gentile and Francesca Fanelli psnGPCRdb: the structure-network database of G Protein Coupled Receptors Journal of Molecular Biology, 2023 https://doi.org/10.1016/j.jmb.2023.167950
Angelo Felline, Michele Seeber and Francesca Fanelli PSNtools for standalone and web-based structure network analyses of conformational ensembles Computational and Structural Biotechnology Journal, 7 January 2022 https://doi.org/10.1016/j.csbj.2021.12.044
|
If you have any questions or have encountered any problems with this web server please do not hesitate to contact us at the following email address:
or fill the following form (all fields are required):
We strive to keep our database as complete and up-to-date as possible, but:
you can use our free WebPSN webserver following the link below
Total Number Of DB Entries | 2978 |
Total Number Of Single Networks | 1480 |
↳ Class A | 1181 |
↳ Class B1 | 141 |
↳ Class B2 | 34 |
↳ Class C | 86 |
↳ Class D1 | 5 |
↳ Class F | 30 |
↳ Class T | 3 |
Total Number Of Differences | 288 |
Total Number Of Consensuses | 760 |
Total Number Of Ligands | 450 |
Total Number Of Visits | -1 |
Visited Single Network Pages | 74308 |
Visited Difference Pages | 10681 |
Visited Consensus Pages | 24582 |
Visited Ligand Pages | 20828 |
Searched Ligands | 2330 |
Last DB Update | 2024-09-04 |
Use the above menu bar to navigate this web site and the dropdown menus marked with a down facing triangle (▼) to access the various sections of the database.
In more detail: