Benchmarking of force fields to characterize the intrinsically disordered R2
Scientific Reports volume 13, Article number: 14226 (2023) Cite this article
Metrics details
Intrinsically Disordered Proteins (IDPs) play crucial roles in numerous diseases like Alzheimer's and ALS by forming irreversible amyloid fibrils. The effectiveness of force fields (FFs) developed for globular proteins and their modified versions for IDPs varies depending on the specific protein. This study assesses 13 FFs, including AMBER and CHARMM, by simulating the R2 region of the FUS-LC domain (R2-FUS-LC region), an IDP implicated in ALS. Due to the flexibility of the region, we show that utilizing multiple measures, which evaluate the local and global conformations, and combining them together into a final score are important for a comprehensive evaluation of force fields. The results suggest c36m2021s3p with mTIP3p water model is the most balanced FF, capable of generating various conformations compatible with known ones. In addition, the mTIP3P water model is computationally more efficient than those of top-ranked AMBER FFs with four-site water models. The evaluation also reveals that AMBER FFs tend to generate more compact conformations compared to CHARMM FFs but also more non-native contacts. The top-ranking AMBER and CHARMM FFs can reproduce intra-peptide contacts but underperform for inter-peptide contacts, indicating there is room for improvement.
Intrinsically disordered proteins (IDPs) are proteins that can form different conformations depending on the environment and their binding partners1. Some IDPs can self-aggregate to form amyloid fibrils which take on the cross-β structure2. The cross-β structure consists of beta-strand proteins/peptides that are stacked along the length of the fiber forming long beta-sheets called protofibril. Finally, complexes of protofibrils form amyloid fibrils3.
Amyloid fibrils are associated with diseases4,5 such as Alzheimer’s, Parkinson’s, type II diabetes, Amyotrophic Lateral Sclerosis (ALS)4,6,7 and others. ALS is a rare neurodegenerative disease8,9 where in 50% of cases, death occurs within three years of the first clinical manifestation10. In ALS patients, amino acid mutations have been found in the Low-Complexity (LC) region of the Fused in Sarcoma (FUS) protein11,12,13,14,15,16. Irreversible amyloid fibril aggregation has been observed in mutated FUS-LC region, whereas reversible fibrils are observed in the wild-type11,16,17.
The human FUS protein (526 residues) is involved in mRNA splicing and transcription. The FUS-LC-core33–96 is involved in amyloid fibril formation11,17,18,19,20,21 and contains four repeat motifs (R1-R2-R3-R4)17 (Fig. S1, violet boxes). Within R1/R2, the tandem [S/G]Y[S/G] motifs have been implicated in the formation of Reversible Amyloid fibril Cores (RAC)17 (Fig. 1 and Fig. S1). The R2 region has been known to be more important for fibril formation than R122,23. The structures of the FUS-LC-core33–9616,24 (Fig. 1) show the R2 region has few long-distance contacts with the rest of the LC-core domain (Fig. S2). Therefore, the R2-FUS-LC50–65 region is a good candidate for studying amyloid fibrillation.
Domain organization of the full length Fused in Sarcoma (FUS) protein. The FUS N-terminal low-complexity (LC) domain (residues 1–214) contains a QGSY-rich prion-like domain (1–165, violet box) and a Gly-rich region (166–214, pink box). Within the QGSY-rich domain, there are four repeat motifs (R1–R2–R3–R4). Within R2 (R2-FUS-LC region) there is a Reversible Amyloid Fibril Core (RAC 2) that is involved in fibril formation. We take only the R2-FUS-LC region to study FUS fibrillation. Inside on the top red square are two different experimentally solved conformations of the R2-FUS-LC region. On the left, six representatives of R2-FUS-LC region selected from the 20 models of the NMR structure (PDB ID: 5W3N16, “U-shaped”). On the right, the cryo-EM model from PDB ID: 7VQQ24, “L-shaped”. Figure was prepared with Microsoft PowerPoint and VMD v1.9.325 (https://www.ks.uiuc.edu/Research/vmd/).
Our current understanding of the mechanism of amyloid fibril aggregation is poor. Furthermore, some of the transient intermediate structures of IDPs have been reported to be toxic26. One approach to study fibril formation and its intermediates is by performing all-atom Molecular Dynamics (MD) simulations. Amyloid β-peptides that form the amyloid fibrils in Alzheimer’s are commonly studied using MD simulations27,28,29.
However, experimental data and simulation results often show discrepencies27,30. One of the main reasons for this is that the all-atom force fields (FFs) such as AMBER, CHARMM31,32, OPLS-AA33,34 and GROMOS35,36 were developed to reproduce the properties of stable globular proteins. In contrast, IDPs adopt multiple unstable conformations where nonpolar residues are often exposed to the solvent. Multiple research groups have tuned FFs to better reproduce experimental data27,37,38,39,40,41,42 [Table S1 and Sup. Info. Section Benchmark of Force Fields (FFs) and Water Models (WMs)]. It is unclear if these modified FFs are generalizable across all IDPs. Note that throughout the text, when we refer to the FFs’, we are referring to the result or properties observed from performing simulations with the respective FF.
In this work, we apply widely used MD force fields for studying IDPs to the R2-FUS-LC (Fig. 1 and Fig. S1) to evaluate whether they can sample conformational ensembles of fibrils that are consistent with experimental data. We select ten FFs and water models (WMs) recently developed for IDPs. For comparison, we included c27s3p, a99sb4pew and a14sb3p FFs which were used to develop some of the IDP FFs. We score these thirteen FFs based on three criteria: the compactness of the fibrils, the intra-peptide contacts in the cross-β state and the secondary structure propensity. Surprisingly, most FFs fail to reproduce the experimental data. Our scoring method suggests that CHARMM36m2021 FF with the mTIP3P water model is the best for studying R2-FUS-LC fibrillation.
We conducted six MD simulations, each lasting 500 ns, totaling 3 μs, with each of thirteen FFs (Table S1). To evaluate the FFs against the reversible amyloid fibril R2-FUS-LC region (trimer of 16 residue peptides), we employed three measures: radius of gyration (Rg), secondary structure propensity (SSP) and intra-peptide contact map. Rg measures the global compactness/extension of the trimer and individual peptides (Fig. S3). SSP and intra-peptide contact map both concern the local contact details of the R2-FUS-LC region. We ranked the thirteen FFs based on a combined score derived from these three measures (Table 1, Final Score).
Based on the final score (Table 1), the thirteen FFs can be separated into three distinct groups: top (“*”), middle (“•”) and bottom (“#”) ranking groups according to the order of scores.
FFs in the “top” group have medium (0.3–0.7) to high (> 0.7) scores for all measures. FFs in the “bottom” group like c27s3p and a03ws, have low scores (< 0.3) in all three measures. However, a14sb3p stands out with relatively good scores for SSP and contact map, but a low Rg score. On the other hand, c36m3pm has the best intra-peptide contact map score but poor SSP score. FFs in “middle” ranking group tend to have low scores for at least one of the three measures but have medium agreement for the remaining. Details of the three measures will be explained in the following sections.
The Rg score measures the ability of the FFs to sample both compact and unfolded states of the R2-FUS-LC region. The reference data for the folded cross-β structure state comprises two distinct conformations of the R2-FUS-LC region: “U-shaped” conformation (PDB: 5W3N16) and “L-shaped” conformation (PDB: 7VQQ24), as shown in Fig. 1. Both conformations form a cross-β structure. Twenty U-shaped models with different loop conformations were solved by NMR with an average Rg of 10.0 Å (trimer of R2-FUS-LC) and the less compact (Rg: 14.4 Å) L-shaped conformation (trimer of R2-FUS-LC) was solved by cryo-EM. The reversibility of the cross-β structure11,17 suggests that the R2-FUS-LC region could adopt the unfolded state. To estimate the Rg of the unfolded state, we employed Flory's random coil polymer model with optimized parameters for IDPs43. We thus have three different measures of Rg: L-shaped, U-shaped, and Unfolded Rg’s.
The distributions of Rg for snapshots from the simulation data are presented in Fig. 2 and Fig. S4. In order to assess how frequently the FFs can generate conformations that closely resemble the reference structures in terms of Rg, we fitted the Rg distributions with two Gaussian distributions. The distance from the i’th mean to the k’th reference Rg was computed as the absolute number of standard deviations (\({\text{Z-score}}_{FF,k,i}\)) of the i’th Gaussian. The lowest \({\text{Z-score}}_{FF,k,i}\) was chosen, inverted and normalized by linearly scaling it from 0.00001 to 1.0 (Table 2). The final Rg score was calculated by multiplying the three normalized scores and rescaling them in the same manner (Table 2).
Distribution of the Radius of gyration (Rg). The top two ranking FFs for the final Rg score (Table 2) are C36m2021s3p (red) and c36ms3p (pink). These FFs can generate compact and extended conformations covering both the U and L-shaped Rg’s. However, a14sb3p (green) sampled compact conformations only (shown in the inset for a zoomed-in view). On the other hand, c36m3pm (light green) generated extended conformations. Both FFs rank at the bottom. Figure was prepared with Matplotlib v3.544 (https://matplotlib.org/).
The thirteen FFs can be separated into three groups. Comparing the ranking of the final score (Table 1) and the final Rg score (Table 2), the top four FFs (c36m2021s3p, a99sb4pew, a19sbopc and c36ms3p) are the same although the order is slightly different.
Analyzing Table 2, we identified two distinct types of FFs among the top four. The first type includes c36m2021s3p and a19sbopc, both of which exhibited consistent performance across the three reference Rg’s. On the other hand, the second type consists of c36ms3p and a99sb4pew, which demonstrated a preference for flexible and compact conformations, respectively. We also observed bias towards the U-shaped Rg in a99sb4pew, a99sbCufix3p, and a14sb3p (Fig. S4), while c36m2021s3pm and c36m3pm (Fig. S4) favored the unfolded Rg. However, it is worth noting that the final Rg rank of these FFs is inversely correlated with the strength of their bias of sampled conformations. This is because our scoring scheme penalizes FFs that fit well to only one specific reference Rg but perform poorly for the others. As a result, the two worst-performing FFs are c36m3pm and a14sb3p (Fig. 2).
Overall, CHARMM FFs tend to generate more extended conformations than AMBER FFs (Figs. S3 and S4), except for a03ws.
Rg is a measure of the global compactness of a conformation, but it is not suitable for evaluating the details of the conformation. To assess the sampled conformations in more detail, we introduced two measures in the next sections: the intra-peptide contact map and the secondary structure propensity (SSP). These measures evaluate the intra-peptide interactions.
The U-shaped and L-shaped conformations contain 20 and 15 intra-peptide contacts, respectively (Fig. 3). In the L-shaped conformation, there were no observed contacts between residues j and > j + 5 (medium-distance contacts) within a 5 Å cutoff. However, in the U-shaped conformation, medium-distance contacts are found between Tyr50–Tyr55, Tyr50–Thr64, Tyr50–Gly65, Tyr55–Asn63 and Ser57–Ser61. Therefore, we will only consider the U-shaped conformation for evaluating the FFs.
Intra-peptide contact map for U-shaped (left, PDB ID: 5W3N) and L-shaped (right, PDB ID: 7VQQ) conformations with a 5 Å cutoff. Contact types are color-labeled, with "Sc" corresponding to Side Chain and "Bb" to Backbone. In both conformations, the majority of contacts occur within ± 3 residues. The U-shaped conformation exhibits a few medium-distance contacts, whereas the L-shaped conformation lacks any medium-distance contacts. Figure was prepared with Matplotlib v3.544 (https://matplotlib.org/).
For each of the thirteen FF’s MD simulations, we calculated the average intra-peptide contacts for the trimers across all snapshots. The contacts from the FFs and the U-shaped conformation were compared using the Matthews Correlation coefficient (MCC)45 score (Table 3). The MCC score ranges from − 1 to 1, with a value of 1 indicating perfect correlation, − 1 indicating perfect anti-correlation, and 0 indicating no correlation. MCC scoring penalizes false predictions, which means that FFs that predicted the most native contacts (True Positives, TP), such as a99sb4pew, c27s3p, and a14sb3p (Table S2), may not necessarily be the top scoring FFs (Table 3). The process of normalization was carried out in a similar manner as described in the previous section.
In the contact map score ranking (Table 3), there are some surprising results. Despite being classified as a bottom performer based on the final score (Table 1), the c36m2021s3pm and c36m3pm FFs were ranked at the top in the contact map score. On the other hand, c36m2021s3p, which had the highest final score, was only ranked 4th. Additionally, a99sbCufix3p was placed at the bottom even though it had middle ranking in the final score.
Upon examining the confusion matrices for the four FFs (Table 4 and Table S2), we have a few observations. Firstly, c36m2021s3pm, c36m3pm, and c36m2021s3p, which have the highest Unfolded Rg scores, tend to favor more extended conformations. This is evident from the fact that < 20% of residues pairs are in contact. In contrast, a99sbCufix3p prefers more compact conformations, with 23.16% of residue pairs in contact.
It is worth noting that while CHARMM FFs with extended conformations have fewer overall contacts (Table 4, Total Pred. True), they also have significantly lower non-native contacts (Table 4, bolded). Conversely, a99sbCufix3p has more contacts overall, but a larger proportion of these contacts are non-native (Table 4). After excluding these four FFs, the contact map score rankings of the remaining FFs (Table 3) were found to be consistent with the final score rankings (Table 1).
The L-shaped conformation primarily consists of β-strands but does not form β-sheets within the peptide. As a result, its conformations cannot be determined from the intra-peptide contact maps. This motivates us to assess the Secondary Structure Propensity (SSP) of these FFs in the following section.
In each snapshot, Dictionary of Protein Secondary Structure (DSSP)46,47 is used to assign the secondary structure type (α-helix, β-strand, or coil) to each residue. For each residue, we define its SSP/probability for all secondary structure types by counting the number of occurrences in all snapshots and dividing it with the total number of snapshots. To evaluate the FFs’ SSP for the R2-FUS-LC region, we define the SSP score as the log likelihood of observing the experimental (U-shaped and L-shaped) secondary structures given the SSP probability distributions obtained from the simulation snapshots (details in Methods). Normalization was performed in a similar fashion as outlined previously.
In Table 5, we observe that the SSP score is higher for a14sb3p and a99sbCufix3p, but lower for c36ms3p and c36m2021s3pm compared to their final rank. The FFs with higher SSP scores tend to produce more compact conformations, as evidenced by their Rg scores (Table 2). Conversely, the FFs with lower rankings tend to favor more extended conformations. This suggests that the formation of native secondary structures is necessary for the development of compact fibrils.
The a03ws and c27s3p FFs, which have the lowest SSP scores, exhibit a strong tendency to generate conformations with α-helices around the RAC2 motif (Fig. 4). However, in both the L-shaped24 and U-shaped16 conformations, this region forms β-strands. NMR chemical shift data also confirms the absence of α-helices in the R2-FUS-LC region16,17,18,22,48.
α-Helix Secondary Structure Propensity (SSP). Experimental data from Murray et al. indicate no α-helix propensity within the 16 residues. Only FFs with a per residue α-helix propensity greater than 0.1 are displayed. All FFs, except for a14sb3p (2nd ranked), are located at the bottom of the SSP rank (Table 5). Despite having a small amount of α-helix SSP, a14sb3p has a high SSP ranking due to its high β-strand SSP (Fig. S5). Figure was prepared with Matplotlib v3.544 (https://matplotlib.org/).
Our observation indicates that AMBER FFs, specifically a14sb3p and a99sbCufix3p, generate more compact conformations compared to the CHARMM FFs, c36ms3p and c36m2021s3pm, which tend to produce more extended conformations. FFs that have more extended conformations can better replicate the properties of both U- and L-shaped conformations when compared to FFs generating more compact conformations that fit well only to one of these conformations.
Both AMBER and CHARMM FFs do not properly reproduce the inter-peptide interactions required to form fibrils (Table S3), although AMBER FFs perform relatively better than CHARMM FFs. The preference of CHARMM FFs for extended conformations extends to their inter-peptide interactions: the peptides spend approximately half the time as dimers or monomers in contrast to AMBER FFs (except a03ws) which mainly stay as trimers (Table S4). The inter-peptide interaction scores were not included in the final score as all FFs perform poorly and would just add noise to the final ranking.
In our study, the limited sampling time (only 3 µs) may have contributed to the poor performance observed. Fibril formation typically occurs over a much longer timescale of hours to days16,24. To alleviate but not resolve this issue, we have increased the peptide concentration from 0.16 mM (as used in NMR structure determination16) to 10 mM by decreasing the ratio of the number of peptides to the number of water molecules (add more waters) and initiated simulations from the U-shaped conformation in fibril form. Higher concentration of FUS peptides increases the chance of these peptides interacting with one another and forming fibrils. Additionally, we observed that most FF simulations have nearly converged within 300 ns by monitoring the average Rg values over time (Fig. S6).
It is possible that the chosen fragment is not capable of forming fibrils independently. To investigate this possibility, we analyzed the intra-protein interactions of the full-length LC domain (residues 1–214). The contact maps show that there are limited long-distance contacts between the R2-FUS-LC regions and the rest of the protein in the U/L-shaped conformations (Fig. S2), indicating that the R2-FUS-LC region functions as a distinct domain within the larger LC domain, at least in the fibril.
Our study corroborates previous research conducted by Lao et al. who utilized the a99sbildn4pd with TIP3P water model instead of TIP4PD to simulate a longer region of FUS that included the R2-FUS-LC region23. Our findings show similar contact map patterns as their study, and we also observed ~ 5% propensity of α-helices, which is consistent with their work. However, we noted a discrepancy in the β-strand propensity between our study and theirs, with their study observing much higher β-strand propensity than we did (Fig. S5). This difference may be attributed to the length of the peptide used in each study, with their study using a peptide length of 60 residues compared to our study's length of 16 residues.
Our findings are also consistent with other studies on Amyloid-β proteins, where different FFs produced different conformational ensembles, some of which are compatible with fibril aggregation42,49,50. For example, Pedersen et al. found that a19sbopc produces more compact conformations and forms more β-strand than a99disp49, while Samantray et al. demonstrated c36ms3p gave promising results for sampling and giving conformational ensembles with random coil and β-strand structures50. Finally, our final score ranking is the same as the Amyloid-β peptide (Aβ40) results from Robustelli et al. showing that c36ms3p is the best, except for c22s3p which underperformed a99disp and a99sbildn4pd in our ranking27. Additionally, they also observed that c22s3p and a03ws overestimates the α-helix propensity (Fig. 4).
Additionally, our results are consistent with Piana et al., who tested three and four-site water models with AMBER and CHARMM FFs42. Where using the TIP3P (3-site) water model, CHARMM FFs were found to be more flexible than AMBER FFs, and when using 4-site water models, AMBER FFs became a lot more flexible, except for a99sb4pew (Fig. S4).
In this study, we investigated the effectiveness of force fields developed for globular proteins and their modified versions for intrinsically disordered proteins. We found that these force fields, with their adapted water models, produce distinct conformational ensembles. Our results highlight the importance of utilizing multiple measures for a comprehensive evaluation of force fields due to the intrinsic flexibility of this system which can form β-strand fibrils and adopt random coiled conformations.
Our evaluation of thirteen force fields from CHARMM and AMBER families revealed that c36m2021s3p is the most balanced in terms of the three measures used. This force field can generate various conformations that are compatible with U/L-shaped conformations, and it showed good agreement in terms of the SSP and intra-peptide contact map. Additionally, its mTIP3P water model is computationally less expensive than those of top-ranked AMBER FFs with four-site water models.
Multiple 3D-structures of FUS-LC domain have been solved by X-ray crystallography17,51, electron crystallography17,52, cryo-Electron Microscopy (cryo-EM)24, and NMR16. The details of the structures are listed in the Table S5.
From the 20 U-shaped NMR models (PDB ID: 5W3N16), the region of residues 50–65 (R2-FUS-LC region, Fig. 1, red square) of the first three chains (trimer) was used. The 20 structures were clustered into six groups using the GROMACS tool gmx cluster53 with a RMSD cutoff of 1.7 Å (Cα-only). The six representative structures were used to run six independent MD simulations.
All-atom MD simulations were performed using GROMACS 2020.454,55. The six initial conformations of R2-FUS-LC (663 atoms) were solvated in a cubic box of 80 × 80 × 80 Å3. To replicate the conditions of the NMR experiment22, 137 mM NaCl ions were added (~ 17,000 water molecules). The final systems contain 52,000–68,000 atoms.
The 13 IDP force fields (FFs) and their corresponding water models (WMs) (details in Table S1 and Sup. Info. Section Benchmark of Force Fields (FFs) and Water Models (WMs)): a03ws, a14sb3p, a19sbopc, a99disp, a99sb4pew, a99sbCufix3p, a99sbildn4pd, c22s3p, c27s3p, c36m2021s3p, c36m2021s3pm, c36m3pm and c36ms3p.
Non-bonded cutoff is set to 12 Å. Electrostatic interactions were treated with the smooth particle mesh Ewald method56 for long range interactions and Coulomb for short range interactions. For CHARMM FFs, the Lennard–Jones potential is modified by GROMAC’s force-switching function54,55 between 8 and 12 Å.
The length of solute and water covalent bonds involving hydrogen atoms were kept constant using the LINCS57 and SETTLE58 algorithms, respectively, allowing integration of equations of motion with a 2 fs time step.
For each system, energy minimization followed by a constant pressure and temperature (NPT) 1 ns equilibration run were performed at 1 bar and 300 K. Temperature is controlled by the v-rescale thermostat and pressure by the Berendsen barostat59,60 with the time coupling constants τT = 0.1 ps and τP = 0.5 ps, respectively. A second equilibration run was performed with Nose–Hoover thermostat and Parrinello–Rahman barostat61,62,63 with the time coupling constants τT = 0.5 ps and τP = 2.5 ps for 1 ns.
For the production run, each of the six systems were simulated for 500 ns under the same conditions as in the second equilibration run but with a velocity random seed. Snapshots were saved every 20 ps. The first 100 ns of the production run were discarded, and the remaining 400 ns was used for further analysis.
The above protocol was applied to each of the 13 force fields, giving an accumulated trajectory of 3 μs.
Analysis of snapshots were performed with MDAnalysis64,65, MDtraj66, DSSP46,47 and our own scripts. All figures were plotted with Matplotlib44 (Python module).
All the raw scores were linearly rescaled such that they are between 0.0001 and 1 by:
where \({S}_{norm,\mathrm{FF}}\) and \({S}_{raw,\mathrm{FF}}\) are the normalized and raw scores for a specific force field (FF), respectively.
For each snapshot, the radius of gyration (Rg) of the heavy-atoms of the 16-residue trimer and the individual peptides were computed with MDAnalysis.
The Rg distribution of the trimer (Fig. 2 and Fig. S4) was fitted to two Gaussians using the gaussian mixture model.
To evaluate the FFs, the \({\text{Rg-Score}}_{FF,k}\) (raw score) compares the simulation results to the experimental \({Rg}_{exp,k}\) of k (U-shaped [averaged over 20 models]: 10 Å or L-shaped: 14.4 Å). It is computed as:
where \(\langle {Rg}_{FF,i}\rangle \) and \({SD}_{FF,i}\) are the average and standard deviation, respectively, of the i’th Gaussian.
The Rg of the individual peptides was compared to the predicted \(R{g}_{FL}\) (10.8 Å) from Flory’s polymer theory with parameters optimized for IDPs by Bernado and Blackledge43:
where \({R}_{0}=\) 2.54 Å is the persistence length, ν = 0.522 is the exponential scaling factor, N is the number of residues, \(R{g}_{FF,med}\) is the median Rg of the FF and q3 and q1 are the 75th and 25th percentile, respectively.
The raw scores from (2) and (3) were normalized. The final Rg score is the normalized product of the normalized U-shaped, L-shaped, and Unfolded Rg (Table 2).
To analyze the heavy atom contacts in each peptide, we developed custom code to generate contact maps. A contact was defined as two atoms being within a 5 Å distance, except for neighboring residues along the protein sequence. We counted intra-peptide contacts across all snapshots and filtered out contact frequencies < 1%. Average contact frequencies were then calculated across all three peptides.
Comparing the average contact map from each snapshot to the representative U-shaped conformation (Fig. 3), contacts were classified into one of four groups [Table 6; True(T)/False(F) Positive(P)/Negative(N)]. A contact (no contact) is considered positive (negative). The contacts from all snapshots of the six replicas were accumulated in the confusion matrix (Table S2).
The Matthews correlation coefficient (MCC)45 was used to measure the agreement of the contact maps:
Note that the MCC score is between − 1 and 1. An MCC score of 1 shows perfect correlation with the reference, 0 with no correlation and − 1 with perfectly negative correlation. The raw MCC score is then normalized to obtain the normalized intra-peptide Contact Map score (Table 3).
We classified the peptide's conformation in each snapshot as monomer, dimer, or trimer based on their contacts with other peptides (Table S4). The monomer was defined as having no contact with other peptides, while the dimer had exactly two peptides in contact. The trimer was characterized by each peptide being in contact with another peptide. Two peptides are considered in contact if they have at least one contact. We use the same contact definition as in the intra-peptide analysis.
To identify the middle peptide in trimers, we determined the peptide with the highest inter-peptide contacts in the fibril. We then compared the computed contact map with the experimental contact map of the middle peptide with one of the other two peptides.
For experimental structures, the contact maps of the middle peptide with either of the edge peptides were similar so we chose chain A-B for comparison with the computed contact maps (see Fig. S7). The same contact map was used for dimers.
All contacts between two edge peptides in the ensembles were considered false positives (FPs). We employed the same method as the previous section to classify contacts and compute Matthews Correlation Coefficient (MCC) scores (Table S3).
To assign secondary structures, we utilized the Dictionary of Protein Secondary Structure (DSSP)46,47. Each residue was classified as either α-helix (H), β-strand (E), or coil (C).
Each residue’s secondary structure propensity/probability (\({p}_{i,ss}\), SSP) is computed from the snapshots, combining the data from all three peptides from the same FF:
where: i is the residue position in peptide j at snapshot t from the simulation performed with the forcefield. ssi,j,t is 1 when assigned the respective secondary structure type ss by DSSP or 0 otherwise. ss represents the secondary structure type: α-helix (H), β-strand (E), or coil (C).
The reference secondary structures were obtained from the U-shaped (20 models) and L-shaped (single structure) structures, as shown in Table S6. The FF’s secondary structure score with respect to the experimental structure k (U/L-shaped) is:
where, i represents the residue position, and \({exptl}_{k,i}\) is the secondary structure of residue i in the experimental structure k.
For each FF, the SSP-Score is the log likelihood of observing the experimental secondary structures. We derived the probabilities from the observed propensities from the simulations, assuming that the positions are independent of one another.
The raw FF’s SSP-Scores for the U-shaped and L-shaped conformations were independently renormalized. We took the product of these two normalized SSP-scores and re-normalized them to get the final SSP score (Table 5).
Using the three normalized Rg score, SSP score and intra-peptide MCC score, we calculated the final score for each FF by multiplying the scores and re-normalizing the product (Table 1).
Data available on reasonable request to HK.
Zhou, J., Oldfield, C. J., Yan, W., Shen, B. & Dunker, A. K. Intrinsically disordered domains: Sequence → disorder → function relationships. Protein Sci. 28(9), 1652–1663. https://doi.org/10.1002/pro.3680 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tompa, P. Intrinsically unstructured proteins. Trends Biochem. Sci. 27(10), 527–533. https://doi.org/10.1016/S0968-0004(02)02169-2 (2002).
Article CAS PubMed Google Scholar
Taylor, A. I. P. & Staniforth, R. A. General principles underpinning amyloid structure. Front. Neurosci. 16, 8869 (2022).
Article Google Scholar
Chiti, F. & Dobson, C. M. Protein misfolding, functional amyloid, and human disease. Annu. Rev. Biochem. 75(1), 333–366. https://doi.org/10.1146/annurev.biochem.75.101304.123901 (2006).
Article CAS PubMed Google Scholar
Uversky, V. N., Oldfield, C. J. & Dunker, A. K. Intrinsically disordered proteins in human diseases: Introducing the D2 concept. Annu. Rev. Biophys. 37(1), 215–246. https://doi.org/10.1146/annurev.biophys.37.032807.125924 (2008).
Article CAS PubMed Google Scholar
Burdick, D. et al. Assembly and aggregation properties of synthetic Alzheimer’s A4/beta amyloid peptide analogs. J. Biol. Chem. 267(1), 546–554. https://doi.org/10.1016/S0021-9258(18)48529-8 (1992).
Article CAS PubMed Google Scholar
Uversky, V. N. Intrinsic disorder in proteins associated with neurodegenerative diseases. Front. Biosci. 14, 5188–5238 (2009).
Article CAS Google Scholar
Xu, L. et al. Global variation in prevalence and incidence of amyotrophic lateral sclerosis: A systematic review and meta-analysis. J. Neurol. 267(4), 944–953. https://doi.org/10.1007/s00415-019-09652-y (2020).
Article PubMed Google Scholar
Jones, C. M. & Coleman, S. Neurodegenerative diseases. In Palliative Care (eds Emanuel, L. L. & Librach, S. L.) 382–395 (W. B. Saunders, 2007). https://doi.org/10.1016/B978-141602597-9.10026-2.
Chapter Google Scholar
Mitchell, J. & Borasio, G. Amyotrophic lateral sclerosis. Lancet 369(9578), 2031–2041. https://doi.org/10.1016/S0140-6736(07)60944-1 (2007).
Article CAS PubMed Google Scholar
Kato, M. et al. Cell-free formation of RNA granules: Low complexity sequence domains form dynamic fibers within hydrogels. Cell 149(4), 753–767. https://doi.org/10.1016/j.cell.2012.04.017 (2012).
Article CAS PubMed PubMed Central Google Scholar
Patel, A. et al. A liquid-to-solid phase transition of the ALS protein FUS accelerated by disease mutation. Cell 162(5), 1066–1077. https://doi.org/10.1016/j.cell.2015.07.047 (2015).
Article CAS PubMed Google Scholar
Scekic-Zahirovic, J. et al. Toxic gain of function from mutant FUS protein is crucial to trigger cell autonomous motor neuron loss. EMBO J. 35(10), 1077–1097. https://doi.org/10.15252/embj.201592559 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kwiatkowski, T. J. et al. Mutations in the FUS/TLS gene on chromosome 16 cause familial amyotrophic lateral sclerosis. Science 323(5918), 1205–1208. https://doi.org/10.1126/science.1166066 (2009).
Article ADS CAS PubMed Google Scholar
Vance, C. et al. Mutations in FUS, an RNA processing protein, cause familial amyotrophic lateral sclerosis type 6. Science 323(5918), 1208–1211. https://doi.org/10.1126/science.1165942 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Murray, D. T. et al. Structure of FUS protein fibrils and its relevance to self-assembly and phase separation of low-complexity domains. Cell 171(3), 615-627.e16. https://doi.org/10.1016/j.cell.2017.08.048 (2017).
Article CAS PubMed PubMed Central Google Scholar
Luo, F. et al. Atomic structures of FUS LC domain segments reveal bases for reversible amyloid fibril formation. Nat. Struct. Mol. Biol. 25(4), 341–346. https://doi.org/10.1038/s41594-018-0050-8 (2018).
Article CAS PubMed Google Scholar
Burke, K. A., Janke, A. M., Rhine, C. L. & Fawzi, N. L. Residue-by-residue view of in vitro FUS granules that bind the C-terminal domain of RNA polymerase II. Mol. Cell 60(2), 231–241. https://doi.org/10.1016/j.molcel.2015.09.006 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wootton, J. C. & Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17(2), 149–163. https://doi.org/10.1016/0097-8485(93)85006-X (1993).
Article CAS MATH Google Scholar
Golding, G. B. Simple sequence is abundant in eukaryotic proteins. Protein Sci. 8(6), 1358–1361. https://doi.org/10.1110/ps.8.6.1358 (1999).
Article CAS PubMed PubMed Central Google Scholar
Haerty, W. & Golding, G. B. Low-complexity sequences and single amino acid repeats: Not just “junk” peptide sequences. Genome 53(10), 753–762. https://doi.org/10.1139/g10-063 (2010).
Article CAS PubMed Google Scholar
Ding, X. et al. Amyloid-forming segment induces aggregation of FUS-LC domain from phase separation modulated by site-specific phosphorylation. J. Mol. Biol. 432(2), 467–483. https://doi.org/10.1016/j.jmb.2019.11.017 (2020).
Article CAS PubMed Google Scholar
Lao, Z. et al. Insights into the atomistic mechanisms of phosphorylation in disrupting liquid-liquid phase separation and aggregation of the FUS low-complexity domain. J. Chem. Inf. Model https://doi.org/10.1021/acs.jcim.2c00414 (2022).
Article PubMed Google Scholar
Sun, Y. et al. Molecular structure of an amyloid fibril formed by FUS low-complexity domain. Science 25(1), 103701. https://doi.org/10.1016/j.isci.2021.103701 (2022).
Article CAS Google Scholar
Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynamics. J. Mol. Graph. 14(1), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5 (1996).
Article CAS PubMed Google Scholar
Chen, X.-Q. & Mobley, W. C. Alzheimer disease pathogenesis: Insights from molecular and cellular biology studies of oligomeric Aβ and tau species. Front. Neurosci. 13, 659 (2019).
Article PubMed PubMed Central Google Scholar
Robustelli, P., Piana, S. & Shaw, D. E. Developing a molecular dynamics force field for both folded and disordered protein states. PNAS 2018, 00690. https://doi.org/10.1073/pnas.1800690115 (2018).
Article CAS Google Scholar
Man, V. H. et al. Effects of all-atom molecular mechanics force fields on amyloid peptide assembly: The case of Aβ16–22 dimer. J. Chem. Theory Comput. 15(2), 1440–1452. https://doi.org/10.1021/acs.jctc.8b01107 (2019).
Article CAS PubMed Google Scholar
Carballo-Pacheco, M., Ismail, A. E. & Strodel, B. On the applicability of force fields to study the aggregation of amyloidogenic peptides using molecular dynamics simulations. J. Chem. Theory Comput. 14(11), 6063–6075. https://doi.org/10.1021/acs.jctc.8b00579 (2018).
Article CAS PubMed Google Scholar
Rahman, M. U., Rehman, A. U., Liu, H. & Chen, H.-F. Comparison and evaluation of force fields for intrinsically disordered proteins. J. Chem. Inf. Model. 60(10), 4912–4923. https://doi.org/10.1021/acs.jcim.0c00762 (2020).
Article CAS PubMed Google Scholar
MacKerell, A. D. et al. CHARMM: The energy function and its parameterization. in Encyclopedia of Computational Chemistry (American Cancer Society, 2002). https://doi.org/10.1002/0470845015.cfa007.
Brooks, B. R. et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4(2), 187–217. https://doi.org/10.1002/jcc.540040211 (1983).
Article CAS Google Scholar
Jorgensen, W. L. & Tirado-Rives, J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J. Am. Chem. Soc. 110(6), 1657–1666. https://doi.org/10.1021/ja00214a001 (1988).
Article CAS PubMed Google Scholar
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105(28), 6474–6487. https://doi.org/10.1021/jp003919d (2001).
Article CAS Google Scholar
Gunsteren, W. F. & Berendsen, H. J. C. Biomolecular Simulation: The GROMOS Software (Biomos, 1987).
Google Scholar
van Gunsteren, W. F. et al. Biomolecular Simulation: The GROMOS96 Manual and User Guide (Biomos, 1996).
Google Scholar
Chan-Yao-Chong, M., Durand, D. & Ha-Duong, T. Molecular dynamics simulations combined with nuclear magnetic resonance and/or small-angle X-ray scattering data for characterizing intrinsically disordered protein conformational ensembles. J. Chem. Inf. Model. 59(5), 1743–1758. https://doi.org/10.1021/acs.jcim.8b00928 (2019).
Article CAS PubMed Google Scholar
Mu, J., Liu, H., Zhang, J., Luo, R. & Chen, H.-F. Recent force field strategies for intrinsically disordered proteins. J. Chem. Inf. Model. 61(3), 1037–1047. https://doi.org/10.1021/acs.jcim.0c01175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Best, R. B., Zheng, W. & Mittal, J. Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput. 10(11), 5113–5124. https://doi.org/10.1021/ct500569b (2014).
Article CAS PubMed PubMed Central Google Scholar
Huang, J. et al. CHARMM36m: An improved force field for folded and intrinsically disordered proteins. Nat. Methods 14(1), 71–73. https://doi.org/10.1038/nmeth.4067 (2017).
Article CAS PubMed Google Scholar
Tian, C. et al. Ff19SB: Amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16(1), 528–552. https://doi.org/10.1021/acs.jctc.9b00591 (2020).
Article CAS PubMed Google Scholar
Piana, S., Donchev, A. G., Robustelli, P. & Shaw, D. E. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B 119(16), 5113–5123. https://doi.org/10.1021/jp508971m (2015).
Article CAS PubMed Google Scholar
Bernadó, P. & Blackledge, M. A self-consistent description of the conformational behavior of chemically denatured proteins from NMR and small angle scattering. Biophys. J. 97(10), 2839–2845. https://doi.org/10.1016/j.bpj.2009.08.044 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55 (2007).
Article Google Scholar
Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta Protein Struct. 405(2), 442–451. https://doi.org/10.1016/0005-2795(75)90109-9 (1975).
Article CAS Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637. https://doi.org/10.1002/bip.360221211 (1983).
Article CAS PubMed Google Scholar
Touw, W. G. et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43(D1), D364–D368. https://doi.org/10.1093/nar/gku1028 (2015).
Article CAS PubMed Google Scholar
Schwartz, J. C., Wang, X., Podell, E. R. & Cech, T. R. RNA seeds higher-order assembly of FUS protein. Cell Rep. 5(4), 918–925. https://doi.org/10.1016/j.celrep.2013.11.017 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pedersen, K. B., Flores-Canales, J. C. & Schiøtt, B. Predicting molecular properties of α-synuclein using force fields for intrinsically disordered proteins. Proteins https://doi.org/10.1002/prot.26409 (2022).
Article PubMed PubMed Central Google Scholar
Samantray, S., Yin, F., Kav, B. & Strodel, B. Different force fields give rise to different amyloid aggregation pathways in molecular dynamics simulations. J. Chem. Inf. Model. 60(12), 6462–6475. https://doi.org/10.1021/acs.jcim.0c01063 (2020).
Article CAS PubMed Google Scholar
Hughes, M. P. et al. Atomic structures of low-complexity protein segments reveal kinked β sheets that assemble networks. Science 359(6376), 698–701. https://doi.org/10.1126/science.aan6398 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, H. et al. Programming conventional electron microscopes for solving ultrahigh-resolution structures of small and macro-molecules. Anal. Chem. 91(17), 10996–11003. https://doi.org/10.1021/acs.analchem.9b01162 (2019).
Article CAS PubMed Google Scholar
Daura, X. et al. Peptide folding: When simulation meets experiment. Angew. Chem. Int. Ed. 38(1–2), 236–240. https://doi.org/10.1002/(SICI)1521-3773(19990115)38:1/2%3c236::AID-ANIE236%3e3.0.CO;2-M (1999).
3.0.CO;2-M" data-track-action="article reference" href="https://doi.org/10.1002%2F%28SICI%291521-3773%2819990115%2938%3A1%2F2%3C236%3A%3AAID-ANIE236%3E3.0.CO%3B2-M" aria-label="Article reference 53" data-doi="10.1002/(SICI)1521-3773(19990115)38:1/23.0.CO;2-M">Article CAS Google Scholar
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25. https://doi.org/10.1016/j.softx.2015.06.001 (2015).
Article ADS Google Scholar
Lindahl, E., Abraham, M. J., Hess, B. & van der Spoel, D. GROMACS 2020.4 Manual. (2020). https://doi.org/10.5281/zenodo.4054996.
Essmann, U. et al. A smooth particle Mesh Ewald method. J. Chem. Phys. 103(19), 8577–8593. https://doi.org/10.1063/1.470117 (1995).
Article ADS CAS Google Scholar
Hess, B. P-LINCS: A parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput. 4(1), 116–122. https://doi.org/10.1021/ct700200b (2008).
Article MathSciNet CAS PubMed Google Scholar
Miyamoto, S. & Kollman, P. A. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comput. Chem. 13(8), 952–962. https://doi.org/10.1002/jcc.540130805 (1992).
Article CAS Google Scholar
Berendsen, H. J. C. et al. Interaction Models for Water in Relation to Protein Hydration (Springer, 1981). https://doi.org/10.1007/978-94-015-7658-1_21.
Book Google Scholar
Berendsen, H. J. C., van der Spoel, D. & van Drunen, R. GROMACS: A message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 91(1), 43–56. https://doi.org/10.1016/0010-4655(95)00042-E (1995).
Article ADS CAS Google Scholar
Nosé, S. A unified formulation of the constant temperature molecular dynamics methods. J. Chem. Phys. 81(1), 511–519. https://doi.org/10.1063/1.447334 (1984).
Article ADS Google Scholar
Hoover, W. G. Canonical dynamics: equilibrium phase-space distributions. Phys. Rev. A 31(3), 1695–1697. https://doi.org/10.1103/PhysRevA.31.1695 (1985).
Article ADS CAS Google Scholar
Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 52(12), 7182–7190. https://doi.org/10.1063/1.328693 (1981).
Article ADS CAS Google Scholar
Gowers, R. et al. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In Proceedings of the 15th Python in Science Conference (eds Benthall, S. & Rostrup, S.) 98–105 (Springer, 2016). https://doi.org/10.25080/Majora-629e541a-00e.
Chapter Google Scholar
Michaud-Agrawal, N., Denning, E. J., Woolf, T. B. & Beckstein, O. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 32(10), 2319–2327. https://doi.org/10.1002/jcc.21787 (2011).
Article CAS PubMed PubMed Central Google Scholar
McGibbon, R. T. et al. MDTraj: A modern open library for the analysis of molecular dynamics trajectories. Biophys. J. 109(8), 1528–1532. https://doi.org/10.1016/j.bpj.2015.08.015 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Download references
MD simulations were performed using JAEA resources from QST institute and local resources. This work was supported by the JSPS fellowship 2020 (FY2020 JSPS Postdoctoral Fellowship for Research in Japan (Short-term) to MC). This work was also supported in part by KAKENHI (JP18H05534), MEXT as “Program for Promoting Researches on the Supercomputer Fugaku” (Biomolecular dynamics in a living cell, JPMXP1020200101, hp220177) and Japan Agency for Medical Research and Development (AMED) (23ama121024j0002).
Maud Chan-Yao-Chong
Present address: Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, 135, Avenue de Rangueil, 31077, Toulouse Cedex 04, France
Molecular Modeling and Simulation (MMS) Team, Institute for Quantum Life Science, National Institutes for Quantum Science and Technology (QST), 4-9-1, Anagawa, Inage Ward, Chiba City, Chiba, 263-8555, Japan
Maud Chan-Yao-Chong, Justin Chan & Hidetoshi Kono
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
M.C., J.C. and H.K. planed the research project. M.C. carried out all the simulations. M.C. and J.C. analyzed the data. M.C., J.C. and H.K. wrote and reviewed the manuscript text. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. These authors contributed equally.
Correspondence to Hidetoshi Kono.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and Permissions
Chan-Yao-Chong, M., Chan, J. & Kono, H. Benchmarking of force fields to characterize the intrinsically disordered R2-FUS-LC region. Sci Rep 13, 14226 (2023). https://doi.org/10.1038/s41598-023-40801-6
Download citation
Received: 26 December 2022
Accepted: 16 August 2023
Published: 30 August 2023
DOI: https://doi.org/10.1038/s41598-023-40801-6
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.