A tutorial on how to attach R-groups to molecules. Along the way we will learning how to edit molecules in RDKit, how to utilize atom map numbers and wildcard atoms.
In this example, we will be using Chem.RWMol to stitch compounds together to recreate Table 2 in “Structure-Aided Design, Synthesis, and Biological Evaluation of Potent and Selective Non-Nucleoside Inhibitors Targeting Protein Arginine Methyltransferase 5”.
Creating the base molecule
First, we’ll draw the base molecules using the wildcard symbol * for the R and X groups, with atom map numbers such that R and X are *:1 and *:2 respectively.
Let’s write a function for using a dictionary, mapping, which maps atom map number to the group to be added, to add groups onto a Chem.Mol, m. Code explanation is in comments as well as below the code block.
fromrdkitimportChem,DataStructsdefadd_groups(m,mapping):# Written by David Huang, Oloren AI
# Loop over atoms until there are no wildcard atoms
whileTrue:# Find wildcard atom if available, otherwise exit
a=Nonefora_inm.GetAtoms():ifa_.GetAtomicNum()==0:a=a_breakifaisNone:break# Get appropriate group to substitute in from mapping
group=mapping[a.GetAtomMapNum()]ifgroup.GetNumAtoms()==1:# Directly substitute atom in, if single atom group
a.SetAtomicNum(group.GetAtomWithIdx(0).GetAtomicNum())a.SetAtomMapNum(0)else:# Set wildcard atoms to having AtomMapNum 1000 for tracking
a.SetAtomMapNum(1000)fora_ingroup.GetAtoms():ifa_.GetAtomicNum()==0:a_.SetAtomMapNum(1000)# Put group and base molecule together and make it editable
m=Chem.CombineMols(m,group)m=Chem.RWMol(m)# Find using tracking number the atoms to merge in new molecule
a1=Nonea2=Nonebonds=forainm.GetAtoms():ifa.GetAtomMapNum()==1000:ifa1isNone:a1=aelse:a2=a# Find atoms to bind together based on atoms to merge
b1=a1.GetBonds()start=(b1.GetBeginAtomIdx()ifb1.GetEndAtomIdx()==a1.GetIdx()elseb1.GetEndAtomIdx())b2=a2.GetBonds()end=(b2.GetBeginAtomIdx()ifb2.GetEndAtomIdx()==a2.GetIdx()elseb2.GetEndAtomIdx())# Add the connection and remove original wildcard atoms
Our key strategy in this function is to define attachment points with the Atom Map Number in both the molecule and in the mapping dictionary. To make the additions, we loop over all the atoms in the molecule and check if they are a wildcard atom (atomic number 0) and if they are we make the appropriate additions.
Note, we do this in a while loop where we loop until we can’t find a wildcard atom, because we cannot edit a molecule within a for a in m.GetAtoms() loop without throwing a RuntimeError: Sequence modified during iteration. Exception.
To make these additions, we first give the wildcard atoms in the molecule and the group to be added an Atom Map Number of 1000` to keep track of it. Then we use Chem.CombineMols to combine the molecule and group together in one object and then we use Chem.RWMol to make it edittable. We then re-find those tracked atoms and then find the atoms which they are bonded to. We join those atoms together and remove the tracked wildcard atoms.
Recreating the entire table
Now, to recreate the table we will write a CSV file with the R and X groups as SMILES and load it in as a pd.DataFrame.
We can then input the rows of this table into our previous function. We can also convert these to SMILES.