NR-ToxPred Models
Pre-trained machine learning models for predicting the binding activity of small molecules against nine human nuclear receptors (NRs).
These models are used by the NR-ToxPred GUI application β a desktop app that requires no coding experience.
What this repository contains
| Folder | Contents |
|---|---|
MODELS/morgan/ |
SVM classifiers trained on Morgan (ECFP6) fingerprints β one per receptor |
MODELS/MACCS/ |
SVM classifiers trained on MACCS Keys β one per receptor |
MODELS/ARclasses.npy |
Label encoder (Active / Inactive) |
X_train/ |
Training set SMILES used for Applicability Domain assessment |
SuperLearner ensemble models are not included here due to their size (1β1.5 GB each).
Receptors covered
| Receptor | Full Name |
|---|---|
| AR | Androgen Receptor |
| ERA | Estrogen Receptor Alpha |
| ERB | Estrogen Receptor Beta |
| FXR | Farnesoid X Receptor |
| GR | Glucocorticoid Receptor |
| PPARD | Peroxisome Proliferator-Activated Receptor Delta |
| PPARG | Peroxisome Proliferator-Activated Receptor Gamma |
| PR | Progesterone Receptor |
| RXR | Retinoid X Receptor |
How to use
Option A β Desktop GUI (recommended, no coding needed)
Download the NR-ToxPred GUI from GitHub and run the installer. The app will download these models automatically on first launch.
Option B β Python (programmatic use)
from huggingface_hub import hf_hub_download
import pickle, numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem
# Download a model
model_path = hf_hub_download(
repo_id="gokulalgates/nrtoxpred-models",
filename="MODELS/morgan/ARsvm_best.model",
repo_type="model",
)
# Load model
model = pickle.load(open(model_path, "rb"))
# Generate Morgan fingerprint (ECFP6, 1024 bits)
mol = Chem.MolFromSmiles("CC(C)(c1ccc(O)cc1)c1ccc(O)cc1") # bisphenol A
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=1024)
X = np.array(fp).reshape(1, -1)
# Predict
label_enc = {0: "Inactive", 1: "Active"}
pred = model.predict(X)[0]
print(f"AR prediction: {pred}")
Model details
| Property | Value |
|---|---|
| Algorithm | Support Vector Machine (SVM) |
| Fingerprints | Morgan ECFP6 (radius=3, 1024 bits) and MACCS Keys (167 bits) |
| Framework | scikit-learn 0.23.2 |
| Task | Binary classification (Active / Inactive) |
| Applicability Domain | Tanimoto fingerprint similarity to training set |
Applicability Domain
Each prediction comes with a reliability label:
- Reliable β the compound is similar (Tanimoto β₯ 0.25) to at least one training set compound
- Unreliable β the compound lies outside the training chemical space; interpret with caution
The X_train/ folder contains the training set SMILES used to compute these assessments.
Citation
If you use these models in your research, please cite:
Predicting the binding of small molecules to nuclear receptors using machine learning. Brief Bioinform. 2022 May 13;23(3):bbac114. doi: 10.1093/bib/bbac114
License
MIT License
- Downloads last month
- -