MolProphet: An one-stop, general purpose and free AI-based drug discovery platform

Introduction of MolProphetTM Platform

MolProphetTM ( is an online co-creation platform. It aims to make AI technology more accessible to each investigator who work in new drug discovery. It provides an one-stop, general purpose online/private drug discovery platform, to help them accelerate their new drug development process. The platform is mainly focus in the early stage of drug discovery, especially in hits discovery and hits-to-leads stage. The basic functions of the platform is free. The platform insists of Management System and following 6 main modules.

  1. Target Pocket Prediction and Definition Module
  2. Structure-based Drug Discovery Module
  3. Ligand-based Drug Discovery Module
  4. AI Molecular Design Module
  5. AI docking module
  6. Data Analysis Module

1.Target Pocket Prediction and Definition Module

In the process of new drug development, finding a correct target pocket structure can greatly improve the development efficiency. This module is designed to help practitioners quickly define pocket structures as the basis for subsequent molecular screening, design, optimization, and evaluation. The platform uses deep learning algorithms to model known protein pocket structures, providing one to five most competitive pocket structures for each protein and forming a 3D visual selection interface. It also establishes a set of self-developed intelligent algorithms that define key amino acid residues manually and AI algorithms automatically complete the pocket structures. On the other hand, the module creates an associated profile for each pocket to be viewed and recalled at any time. More functions (such as defining the pocket structure precisely to each amino acid site, etc.) will be added to the module in the future to suit the needs of different application scenarios.

Target Pocket Prediction

2. Structure-based Drug Discovery Module

In order to reduce R&D costs, the industry usually tries to screen known commercial libraries, and the traditional CADD virtual screening method has been suffering from slow screening efficiency. Therefore, this module aims to quickly complete the virtual screening of commercial libraries by AI technology. The kernel is a deep neural network trained based on 18 million Bioassay real data for predicting the affinity of unknown targets and ligands, with universal positive rates and enrichment factors exceeding those of traditional techniques, and in the screening speed program, a billion order of magnitude molecule screening has been achieved within 7 days (the online version of the platform has opened a total of tens of millions of molecules for screening). On the other hand, the platform will open multiple iterations of the screening service in the future to achieve two rounds of screening by analyzing the activity data from the previous round of molecular cell experiments to discover the seedling compounds with better than 1 μmol activity in cell experiments. In fact, we have already conducted this operation in our offline service.

Structure-based Drug Discovery

3.Ligand-based Drug Discovery Module

This module aims to quickly retrieve molecules similar to known molecules in known commercial libraries through AI algorithms. It includes 2D structure similarity algorithm (the output result has high similarity in molecular structure), which enables the retrieval of billion molecules data in 1 minute by using 2D structure based on graphical neural network representation, and vector search Milvus algorithm; 3D pharmacophore similarity algorithm (the output result has high structural variation), while 1 day time can be completed by coding 3D features for 1 billion number of molecules.

Ligand-based Drug Discovery

4.AI Molecular Design Module

This module aims to accomplish de-novo design of molecules through AI technology. In contrast to mainstream virtual molecule generation algorithms (i.e., focusing only on activity, not on drug-forming properties and whether they can be synthesized), the module is designed from the beginning with the synthesis pathway as one of the most important reference indicators, providing a synthesis solution for each resulting output molecule. In particular, the synthesis solution consists of 1000+ known reaction templates (based on 3 million+ explicit examples and expert hand-selected templates with practical feasibility) and 500,000+ procurable building blocks matched to them.

AI Molecular Design

5.AI docking module

This module aims to help practitioners perform rapid evaluation of a molecular dataset (usually ligand dummy sieve results or user-owned molecular dataset) through AI technology. Based on geometric deep learning, it learns target pocket information, small molecule structure information; reinforcement learning is used to sample the receptor flexible conformation while optimizing the binding conformation of the ligand molecule to predict the minimum free energy of molecule binding to the target pocket.

AI docking

Data Analysis

This module aims to rapidly predict the interaction characteristics between molecules and ligands through AI technology, with multi-dimensional analysis. Currently, 3 dimensions are used as the main perspective: (1) binding site analysis (corresponding to the left panel), i.e., statistical analysis of binding sites for the possible conformations of ligands. The horizontal coordinate is the number of ligand conformations predicted by AI, and the vertical coordinate is the amino acid site; different color statistics correspond to different types of binding forces, and the statistics represent the number of conformations where the current ligand has a binding force with that amino acid site; (2) binding force analysis (corresponding to the middle panel), i.e., the type of force generated by the ligand and the receptor with information on the associated amino acid sites, each representing a different force ( The type of force is distinguished by color; (3) Binding conformation analysis (corresponding to the right panel), i.e. information on the 3-dimensional pattern of ligand-receptor binding. More analysis tools such as toxic sub-structure hints, generation of proprietary molecular reports, etc. will be added to this module in the future.

Data Analysis

User Interface

Not all practitioners have experience with professional software, and the MolProphetTM platform was built to be as easy to use as possible in order to minimize the barriers to use. Each functional module integrated in the system is simplified to a few simple clicks, and each step is supplemented with necessary hints. Figure 8 below shows the content of the AI Molecular Design module in the platform. With just 2 minutes to review the operation tips of the interface, ordinary users can basically carry out projects in the platform quickly.


Moreover, the MolProphetTM platform has preset multiple ways for all modules that require data from users (such as defining target pockets, submitting reference molecules, etc.) to meet different usage scenarios. For example, when defining target pockets, users can upload local files for immediate processing by the system, or search for standard PDB files by PDB ID (source from the official PDB website And when setting reference molecules, users can choose various options such as drawing molecular structure directly, uploading local files, entering SMILES, reading collection records, etc.


For result management, MolProphetTM platform provides task objectives and conclusion evaluation functions for each task, which provides convenience for later review and also facilitates communication among users. Users can also freely collect, download, and delete the results to meet the needs of different scenarios (Figure 9). For each specific result molecule, depending on the task, a corresponding result analysis interface is provided, which can be viewed by clicking the corresponding molecule tab.

In the future, MolProphetTM platform will add more content related to results and records, including but not limited to automatically generated task reports, molecular reports, experimental record management, etc.

Management System

For project management, MolProphetTM platform establishes a set of project-based data management solutions to solve users’ data management problems. Users can create multiple projects with data isolated from each other. Each project can invite different users to collaborate and share information within the project. The platform also establishes an independent three-level management system (PI, administrator and general members) for each project, which is convenient for users to manage their own research teams.

In terms of task management, users can check the progress of tasks at any time while they are in progress, and they are also allowed to cancel tasks that are in queue or in progress, and to retry at any time if there is an abnormality in the task (in some scenarios, retrying can solve the abnormality). However, to avoid management confusion, the platform restricts users to manage only the tasks they have created.

On the other hand, the platform separately establishes a hybrid cloud system for teams or companies that require a high degree of privacy and confidentiality (e.g., pharmaceutical companies and research teams with corresponding needs, etc.) (Figure 10). It enables instant distribution of the latest models from the public cloud and ensures the security of private cloud data.

Management System


In a word, the MolProphetTM platform provides users with more AIDD services in a format as simple as possible, and is currently open to the public. More services such as Lead Optimization will be online in the future.