Repository logo
 

Research data supporting "CONFPASS: fast DFT re-optimizations of structures from conformation searches"


Change log

Description

This is the supporting data for the paper: “CONFPASS: fast DFT re-optimisations of force field conformation searches”.

CONFPASS (Conformer Prioritizations & Analysis for DFT re-optimisations) has been developed to extract dihedral angle descriptors from conformational searching outputs, perform clustering and return a priority list for DFT re-optimisations. Evaluations were conducted with DFT data of the conformers for 150 structurally diverse molecules, most of which are flexible. CONFPASS gives a confidence estimate that the global minimum structure has been found, and based on our dataset, we can have 90% confidence after optimizing half of the FF structures. Re-optimizing conformers in order of the FF energy often generates duplicate results; using CONFPASS, the duplication rate is reduced by a factor of two for the first 30% of the re-optimisations, which includes the global minimum structure about 80% of the time.

Version

Software / Usage instructions

1. Conformational Searching Conformational searching calculations were conducted in MacroModel (v11.7) with MacroModel (release 2019-01). The Merck molecular force field (MMFF) was used with the mixed torsional / low-mode sampling method and a setting of 2000 steps as the maximum number of steps. Conformers within an energy window of 41.6 kJ/mol (ie an equivalent of 10 kcal/mol) were saved for further analyses. - CONFPASS_FF_data.zip: 822 molecules File organizations: After unzipping the file, you should obtain a single folder that contains outcomes of the conformational searches given in SDF and log format (ie molecule_name.sdf and molecule_name.log). 2. Density Functional Theory (DFT) DFT calculations were performed with Gaussian 16 (Revision B.01). The theory levels for the structural optimizations and single-point energy calculations are specified below. DFT-optimized structures were verified through frequency analyses. All the geometries were confirmed to correspond to a minimum on the potential energy surface (PES). File organizations: Unless otherwise specified, the same format is followed for all the zip files mentioned below. After unzipping the files, you obtained folders with the following format: molecule_name/ -- molecule_name.sdf -- molecule_name_1.out -- molecule_name_2.out -- molecule_name_3.out -- molecule_name_4.out -- ... -- spe/ ----- molecule_name_1_spe.out ----- molecule_name_2_spe.out ----- molecule_name_3_spe.out ----- molecule_name_4_spe.out ----- ... .out files provide opt+freq output. _spe.out files are from single-point energy calculations. The molecule_name.sdf (and molecule_name.xyz in a number folder) gives the structures of the conformer at the force field level from conformational searches. The numbering refers to the order of the conformers by force field energy from conformational searches. a) The DFT dataset (10 zip files in total): MMFF --> ωb97xd/6-311g(d,p)//B3LYP-D3/6-31g(d): 150 molecules Smile strings of the 150 molecules are given in the DFT_dataset_smi.csv. - DFT_data_partx.zip; x = 1-10 b) The benchmarking study at a higher level of theory: MMFF --> ωb97xd/6-311++g(d,p)//ωb97xd/6-31g(d): 20 molecules from Grayson et al. (T. Lewis-Atwell, P. A. Townsend and M. N. Grayson, J. Org. Chem., 2022, 87, 5703–5712.) - wb_data.zip c) Comparisons with CREST Conformational searching calculations were performed with CRESR using the default setting (method: GFN2-xTB; conformational searching algorithm: iMTD-GC; energy threshold: 6 kcal/mol). Redundant structures were eliminated using RMSD at a cut-off of 0.125 Å. The conformational searching output files and the input files in .gjf for the DFT calculations are retained in the corresponding folder for each molecule. Re-optimizations at DFT level: ωb97xd/6-311g(d,p)//B3LYP-D3/6-31g(d) - CREST_comparison_CREST_20.zip: data for 20 molecules (conformational searches with CREST) - CREST_comparison_maestro_11.zip: data for 11 molecules (conformational searches with Maestro) -- The energy cut-off for the conformational searching is 6 kcal/mol. Otherwise, the same procedures were followed for the DFT data set. Files can be opened and viewed using any text editor.

Publisher

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Sponsorship
Trinity College Cambridge and Krishnan-Ang Studentships Programme