STREAMLINING EARLY DRUG DISCOVERY

Our end-to-end platform

01

scientific data mining
800-1,200 pages analyzed in 5 minutes

Input required: user question + 1 click

A revolutionary way of retrieving and screening scientific information. We leverage the recent breakthroughs in Large Language Models and parallel computing to automate data mining from the scientific literature, making it high-throughput.

Our proprietary software automatically screens, summarizes, and categorizes  knowledge about targets, ligands, positive controls, and reaction mechanisms, whether available in a private archive or major search engines such as Google Patents, PubMed, and Web of Science.

A quote from one of our collaborators: "In 3 hours we have done the job that would have taken 2 months to one of my postdocs".

02

target modeling
Up to millisecond-scale Molecular Dynamics simulations

Input required: FASTA or PDB files + 1 click
Optional inputs: positive control(s), simulation parameters (force field, solvent model, membrane environment)

Target structures modeled from amino acid or DNA sequences using AlphaFold2 do an excellent job providing general information about proteins with unknown experimental structures. However, AI-based models often fail (approximately 85% failure rate1) when a deeper knowledge is required, such as that needed to perform accurate docking.

In contrast to most of our competitors, the structures predicted by our pipeline undergo Molecular Dynamics simulations on our cluster to predict meaningful docking poses. Ultrafast docking is then performed with our parallel algorithm, drastically improving the ability of state-of-the-art algorithms to predict both the correct pockets and docking poses on AlphaFold2 structures. sevenTM algorithm achieves <2% failure rate on these tasks.

1. doi: 10.7554/eLife.89386.1

03

High-Throughput Generation and Screening
1 billion compounds screened per hour
1,000-10,000 compounds designed and ranked per hour

Input required: SMILES string(s) + 1 click
Optional input: library of compounds of interest. Default: ZINC20, comprising 800 million small molecules encoded by us in a format optimized for parallel computing

Our approaches include repurposing small molecules from screening a library of commercially available compounds and generat through our proprietary algorithm, which is focused on increasing the specificity for a target pocket.

Our algorithm relies entirely on first principles and, unlike the AI-based solutions often employed to execute this task, it does not output chemically unreasonable structures and it does not require millions or even billions of proprietary activity data points and experimentally resolved ligand-target structures.

The druglikeness of the candidates is pre-screened by proprietary QSAR and mechanistic PK models tailored to reduce later attrition.

04

Toxicity prediction and screening
1,000-5,000 candidates analyzed per hour

Input required: toxicity threshold tolerated + 1 click
Optional input: set/library of compounds of interest

Candidates input by the user, or from the previous step, are screened through a first-of-its-kind set of toxicity filters, which are entirely based on first principles and provide mechanistic interpretation and feedback to the design phase.

The pipeline identifies the inhibition of key metabolic enzymes in multiple organs, such as the liver, heart, kidney, and brain. Additionally, it predicts the biodegradation into toxic metabolites, based on our curated database of key enzymatic reactions and known toxic substrates.

Our approach achieves unparalleled accuracy compared to existing Deep Learning toxicity models, which are doomed to remain theoretical experiments, as they are not trained on billions of in vivo and clinical proprietary data, which most organizations do not have access to.

05

Drug scalability assessment
1,000-5,000 compounds analyzed per hour

Input required: desired range from our scalability-stability tradeoff map + 1 click
Optional input: set/library of compounds of interest

Retrosynthesis, batch stability, and supply chain models joined together and harmonized. This is the first AI model capable of predicting all the critical features involved in an up-to-date, real-world synthesis of a novel compound. It quickly identifies and ranks candidates with high manufacturability and ability to scale, and generates a report for each compound analyzed.

This model is currently under construction.

06

in vitro validation
Virtually identified hits and designed lead compounds undergo a systematic evaluation employing a combination of assays to characterize both their biophysics (target affinity) and cellular effects.

Cell viability and toxicity assessments are conducted using carefully selected and characterized cell lines. The entire process is highly automated to optimize testing throughput, with assays executed in 96 and 384 plates.

Tests are carried out by academic partners and a CRO, with the obtained results serving to validate and improve our models, ensuring a continuous enhancement for the pipeline.