Skip to content


About this resource

TeachOpenCADD is a resource to teach computer-aided drug design (cheminformatics and structural-bioinformatics). It is organized into modules (talktorials) where each talktorial is represented by a jupiter notebook focusing on a single task.

Tutorial Type : code based (Python)

Authors & Contributors : TeachOpenCADD has been initiated by the members of Volkamer Lab, Charité - Universitätsmedizin Berlin. Many contributors have participated in its developments

Prerequisite: Python Knowledge (if you have any programing background you should be able to understand the python code)
TeachOpenCadd Website: Link
Github Link: Link

Howto : The easiest way to start learning without installing anything is to open the jupiper notebook in github (click the link in the table below), it will automatically run the jupiter code and display the result inline. If you want to run the code locally you will need to follow these instructions

Platform : Jupiter notebook (a sharing document that contain live python code, visualizations, and narrative text), RDKit (an a open-source software toolkit for cheminformatics and computational chemistry), Conda & mamba (mamba is a CLI tool to manage conda s environments)

Modules Description Python package
T001 · Compound data acquisition (ChEMBL) Extract data (compounds and activity) from the ChEMBL database related to the EGFR kinase and display their 2D structures
T002 · Molecular filtering: ADME and lead-likeness criteria Remove compounds with low oral bioavailability from the result of the previous task
T003 · Molecular filtering: unwanted substructures Remove toxic,reactive and false-positives compounds from the previous task
T004 · Ligand-based screening: compound similarity Draw 2D molecules, generate molecular descriptors, compare molecules based on these descriptors, then search a library to identify similar compounds (virtual screening)
T005 · Compound clustering from the virtual screening result (T004) use a clustering algorithms to select 1000 diverse compound in order to maximize the chances to find a hit
T006 · Maximum common substructure visualize common scaffolds (MCS) of a set of molecules (T005)
T007 · Ligand-based screening: machine learning how to use supervised ML algorithms to predict the activity of compounds against the EGFR Kinase
T008 · Protein data acquisition: Protein Data Bank (PDB) superimpose ligands from many high resolution EGFR PDB complexes biotite & pypdb
T009 · Ligand-based pharmacophores identify pharmacophore feature from the ligands set generated in the previous tasks
T010 · Binding site similarity and off-target prediction binding site similarity of PDB complexes with Imatinib as ligand biotite & pypdb
T011 · Querying online API webservices query remote bioinformatics API service using Python requests
T012 · Data acquisition from KLIFS
T013 · Data acquisition from PubChem Cheminformatics
T014 · Binding site detection Structural-Bioinformatics
T015 · Protein ligand docking Structural-Bioinformatics
T016 · Protein-ligand interactions Structural-Bioinformatics
T017 · Advanced NGLview usage Structural-Bioinformatics
T018 · Automated pipeline for lead optimization Structural-Bioinformatics
T019 · Molecular dynamics simulation Structural-Bioinformatics
T020 · Analyzing molecular dynamics simulations Structural-Bioinformatics
T021 · One-Hot Encoding Cheminformatics
T022 · Ligand-based screening: neural networks Cheminformatics
T023 · What is a kinase? Kinase Similarity
T024 · Kinase similarity: Sequence Kinase Similarity
T025 · Kinase similarity: Kinase pocket (KiSSim fingerprint Kinase Similarity
T026 · Kinase similarity: Interaction fingerprints Kinase Similarity
T027 · Kinase similarity: Ligand profile Kinase Similarity
T028 · Kinase similarity: Compare different perspectives Kinase Similarity