Input types for enzymes/proteins and metabolites
The input for an enzyme must be a string containing the enzyme's amino acid sequence. The model currently does not handle missing information, such as the '*' character, and including it will prevent the model from proceeding.
There are three valid input types for the metabolite: SMILES string, KEGG Compound ID, and InChI string
InChI strings are textual representations of chemical structures. Every InChI string is a unique identifier and contains detailed information about the structure of a small molecule. For more details on InChI, see this page from IUPAC.
The KEGG Compound database contains identifiers for many small molecules and drugs. A KEGG Compound ID starts with a "C" or "D" followed by a five-digit number. For more information see the KEGG homepage.
Simplified Molecular Input Line Entry Specification (SMILES) allows to represent the structure of a molecule using ASCII strings. You can get the SMILES for a molecule e.g. by searching for the molecules name in PubChem. Since SMILES representations are not unique for all molecules, we recommend to use InChI string or KEGG Compound IDs instead, if possible.
Single Input File
Multiple Input file
Starting from December 9, 2024, we have switched from using CLS files to XLSX files specifically for kcat predictions. XLSX files can be created using spreadsheet programs like Microsoft Excel or Google Sheets, or through pandas library in python. For more details on how to create an XLSX file, you can refer to Microsoft Excel or Google Sheets.
Your file format depends on the model you are using. Attention: InChI strings can contain commas (","), so be sure to properly structure your data in the required format.
You can download a sample file for each model below.
Example of multiple inputs with a file. The enzyme-substrate pairs and metabolites displayed here are not real.
Your file must be in XLSX format and contain exactly two columns, one called "Protein" and one called "Metabolite". Each row should contain one enzyme and one metabolite in the format described above. The upper limit of accepted enzyme-metabolite pairs is 500. You can download a sample file here. We have shown that the prediction performance of our model is low when it is applied to metabolites which were not present in our training set. Therefore, we check for every uploaded metabolite if it was part of our training set. We return this information in the column "metabolite".
Your file must be in XLSX format and contain three columns, titled "Enzyme", "Substrates", and "Products". Each row should contain one enzyme-reaction pair in the format described above. For both columns Substrates and Products, metabolites should be separated by a semicolon ";". The upper limit of accepted enzyme-reaction pairs is 500. You can download a sample XLSX file here.
KM prediction:Your file must be in XLSX format and contain exactly two columns, one called "Enzyme" and one called "Substrate". Each row should contain one enzyme and one metabolite in the described format. The upper limit of accepted enzyme-metabolite pairs is 500. You can download a sample file here.
SPOT:Your file must be in XLSX format and contain exactly two columns, one called "Protein" and one called "Metabolite". Each row should contain one transporter and one molecule in the format described above. The upper limit of accepted transporter-molecule pairs is 1000. You can download a sample file here.
© 2019 All Rights Reserved By Free Html Templates