NR-ToxPred Models

Pre-trained machine learning models for predicting the binding activity of small molecules against nine human nuclear receptors (NRs).

These models are used by the NR-ToxPred GUI application β€” a desktop app that requires no coding experience.


What this repository contains

Folder Contents
MODELS/morgan/ SVM classifiers trained on Morgan (ECFP6) fingerprints β€” one per receptor
MODELS/MACCS/ SVM classifiers trained on MACCS Keys β€” one per receptor
MODELS/ARclasses.npy Label encoder (Active / Inactive)
X_train/ Training set SMILES used for Applicability Domain assessment

SuperLearner ensemble models are not included here due to their size (1–1.5 GB each).


Receptors covered

Receptor Full Name
AR Androgen Receptor
ERA Estrogen Receptor Alpha
ERB Estrogen Receptor Beta
FXR Farnesoid X Receptor
GR Glucocorticoid Receptor
PPARD Peroxisome Proliferator-Activated Receptor Delta
PPARG Peroxisome Proliferator-Activated Receptor Gamma
PR Progesterone Receptor
RXR Retinoid X Receptor

How to use

Option A β€” Desktop GUI (recommended, no coding needed)

Download the NR-ToxPred GUI from GitHub and run the installer. The app will download these models automatically on first launch.

πŸ‘‰ NR-ToxPred GUI on GitHub

Option B β€” Python (programmatic use)

from huggingface_hub import hf_hub_download
import pickle, numpy as np
from rdkit import Chem
from rdkit.Chem import AllChem

# Download a model
model_path = hf_hub_download(
    repo_id="gokulalgates/nrtoxpred-models",
    filename="MODELS/morgan/ARsvm_best.model",
    repo_type="model",
)

# Load model
model = pickle.load(open(model_path, "rb"))

# Generate Morgan fingerprint (ECFP6, 1024 bits)
mol = Chem.MolFromSmiles("CC(C)(c1ccc(O)cc1)c1ccc(O)cc1")  # bisphenol A
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=3, nBits=1024)
X = np.array(fp).reshape(1, -1)

# Predict
label_enc = {0: "Inactive", 1: "Active"}
pred = model.predict(X)[0]
print(f"AR prediction: {pred}")

Model details

Property Value
Algorithm Support Vector Machine (SVM)
Fingerprints Morgan ECFP6 (radius=3, 1024 bits) and MACCS Keys (167 bits)
Framework scikit-learn 0.23.2
Task Binary classification (Active / Inactive)
Applicability Domain Tanimoto fingerprint similarity to training set

Applicability Domain

Each prediction comes with a reliability label:

  • Reliable β€” the compound is similar (Tanimoto β‰₯ 0.25) to at least one training set compound
  • Unreliable β€” the compound lies outside the training chemical space; interpret with caution

The X_train/ folder contains the training set SMILES used to compute these assessments.


Citation

If you use these models in your research, please cite:

Predicting the binding of small molecules to nuclear receptors using machine learning. Brief Bioinform. 2022 May 13;23(3):bbac114. doi: 10.1093/bib/bbac114


License

MIT License

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support