1. How to use Graph-pKₐ for pKₐ prediction

A typical in silico pKₐ prediction workflow of Graph-pKₐ is as following:

(1) Molecule submission

A molecule is submitted from the user-side(browser) by drawing or by uploading a mol/sdf/txt file (Molecule Submission).

(2) Structure standardization

The input molecule structures are standardized in the same way as the training data. The structure standardization procedure includes removing all salts from molecules, neutralizing charged molecules, and standardizing SMILES strings. Hence Graph-pKₐ does not support the pKₐ prediction for ionized molecules.

(3) Result calculation

Based on the standardized molecules, the micro and macro pKₐ values are computed by server-side with a pre-trained model.

(4) Result output

Results download and/or display online. The results include a molecule image labeled with atom index and two tables recording the predicted values of acidic and basic pKₐ.


2. How are similar molecules recommended

The process of searching for similar molecules is as following:

(1) Prediction of the most acidic/basic atoms

The most acidic/basic atoms of the molecules from our dataset and the most acidic/basic atoms of the molecule input by the user are predicted by Graph-pKₐ.

(2) Extraction of atom embeddings

The embeddings of those predicted most acidic/basic atoms in the last hidden layer are extracted.

(3) Euclidean distances calculation

The Euclidean distances between the atom embeddings of the input molecule and that of the molecules in our dataset are calculated. If the Euclidean distance between the embeddings of two atoms is less than 0.05, the two molecules that they belong to are considered to be similar. For an input molecule, up to four similar molecules can be output. The predicted most acidic/basic atoms of the molecules in our dataset is marked with red/blue color.


3. Submission List, Job, and Record

Submission list, job, and record are the terms in Graph-pKₐ for data processing. Details of which are explained as follows:

(1) Submission List

a submission list is a temporary location for jobs to be executed, with each of the jobs having valid inputs for the next steps in the workflow. People can delete any improper submissions as needed, and submit them to the server for processing.

(2) Job

A job refers to a standard data processing unit in Graph-pKₐ, which should possess valid inputs (molecules) and outputs (computing results and figures), as well as the typical 4 steps of Graph-pKₐ in its life cycle. According to different situations, a job may be at various statuses such as “queue”, “running”, “done”, “error”, etc. A job with a “done” sign at computing status means the process of activity computation is finished, and a “done” sign at presenting status denotes to the termination of figure and table generation.

(3) Record

A record consists of several jobs submitted at the same time, within a same submission list. As for convenience, users can submit as many as 5 jobs (molecules) simultaneously.