Developing the ‘best’ molecular property predictors
In this post we explore our new benchmark site, which provides directly reproducible benchmarks with OCE model parameter strings.
AI Drug Discovery Conferences
Where to meet people working on adding AI to the drug discovery process.
Practically Beyond ‘Novel’ Methods
Oloren AI using Oloren ChemEngine (aka Oloren Chem Engine, OCE) demonstrates that models constructed simply with OCE outperform graph neural networks (GNNs) including GEM, D-MPNN, AttentiveFP, GROVER, PretrainGNN, etc. on toxicity tasks (Tox21) part of the MoleculeNet suite of benchmarks. Molecular property predictors powered by ensembles of traditional chemoinformatics methods and modern neural networks are extremely powerful.
CDD Vault to DataFrame: Python API and tutorial for querying and downloading data
Oftentimes, it is a necessary step to export data out of CDD Vault into Python or otherwise onto a local machine. This can be an annoying task, so we want to help make the process as simple as possible so you can get to your analysis.
Evaluating Model Uncertainty with Oloren ChemEngine
A summary of the BaseErrorModel class in Oloren ChemEngine that constructs error models for predicting error bars.
Modeling imbalanced datasets: how much data do we need?
This post investigates how well models classify molecules with varying levels of minority train set representation. We find that model performance increases sharply with increasing representation before levelling off prior to fully balanced data. There is additionally significant dataset-dependent variance in this behavior.
ACS Fall 2022
We were so excited to attend ACS Fall 2022 and meet with the Chemical Information (CINF) and Computers in Chemistry (COMP) divisions. We presented our on OlorenVec, our supervised contrastive learned molecular representation, as well as released our Oloren ChemEngine, a Python library for molecular property predictors and associated tools including uncertainty quantification and interpretability tools.
We compare several molecular representations, including Oloren's OlorenVec representation, to determine how representations can be optimally selected based on the task.
Introduction to Scaffold Splitting
Discrepancies between the data a model encounters in production and in development necessitate a solid understanding of model generalizability to make sure that we are building models that are truly useful in production. This blog explains and visualizes the concept of scaffold splitting, a method to split datasets and test model generalizability.
Visualizing the chemical space
After creating models for drug classification, we need a practical way of visualizing our chemical space and verifying our results. This guide will walk you through how to generate an interactive plotly graph of chemicals that renders 2D images of molecules on hover. A good library for this is molplotly, and the below tutorial teaches you how to write and customize your own code for this.
Adding R-groups to molecules in RDKit
A tutorial on how to attach R-groups to molecules. Along the way we will learning how to edit molecules in RDKit, how to utilize atom map numbers and wildcard atoms.