Hu Li Lab | Mayo Clinic

Data and Software

LIFE
Learning-Based Invariant Feature Engineering (LIFE) is a novel feature engineering platform. Symmetry refers to properties that remain invariant upon mathematical transformations, yet it remains unexplored in biology and medicine. We set to explore symmetry relationships in gene expression to distinguish between healthy and disease states. We hypothesize that there are relationships between gene expressions that remain invariant across individuals displaying the same biological phenotype. Our Gene Expression Symmetry Hypothesis (GESH) posits that the invariant nature of phenotypic traits in cells is defined by a set of genes exhibiting specific symmetric expression relationships. We deployed a hybrid machine learning approach implemented with two symmetric invariant feature functions (IFFs) to identify Invariant Feature Genes (IFGs), which are gene pairs whose IFF single-value outputs remain invariant across individual samples in each phenotype. Our multiclass classification results across the transcriptomes of 25 normal organs, 25 cancer types, and blood samples from 4 different types of neurodegenerative diseases identified unique fingerprints. We constructed networks from IFGs and found that cancer IF-Nets hubs were enriched with approved and clinical trial drugs, highlighting symmetry breaking as a novel treatment approach.
Reference: Learning-Based Invariant Feature Engineering (LIFE)
SPIN-AI
Spatially resolved sequencing technologies help us dissect how cells are organized in space. Several available computational approaches focus on the identification of spatially variable genes (SVGs), genes whose expression patterns vary in space. The detection of SVGs is analogous to the identification of differentially expressed genes and permits us to understand how genes and as-sociated molecular processes are spatially distributed within cellular niches. However, the ex-pression activities of SVGs fail to encode all information inherent to the spatial distribution of cells. Here, we devised a deep learning model - Spatially Informed Artificial Intelligence (SPIN-AI) to identify spatially predictive genes (SPGs), genes whose expression can predict how cells are organized in space, without any prior assumptions of spatial distribution. We used SPIN-AI on spatial transcriptomic data from squamous cell carcinoma (SCC) as a proof-of-concept. Our results demonstrated that SPGs recapitulate the biology of SCC but also identify genes distinct from SVGs. Moreover, we found a substantial number of ribosomal genes that are SPGs but not SVGs. Since SPGs possess the capability to predict spatial cellular organization, we reason SPGs capture more biologically relevant information for a given cellular niche. Hence, SPIN-AI has broad applications to detect SPGs and uncover which biological pro-cesses play important roles in governing cellular organization.
Reference: Biomolecules 2023.
SPIN-AI source code are availalbe for public accessing Li Lab.
ANNE
Artificial neural network (ANN) was initially created to model how human brain works. Over past few decades, ANN has evolved into numerous sophisticated algorithms with proven outstanding performance in various recognition tasks. Artificial Neural Network Encoder (ANNE) is a novel weight engineering deep machine learning method that harness the power of autoencoder and demonstrated that it is possible to decode meaningful information encoded in ANN models trained for specific tasks. We applied ANNE on breast cancer gene expression data with known clinical properties as case studies. Our work illustrates the trained autoencoder models are indeed information encoders that meaningful gene-gene associations with numerous supported evidences can be retrieved. ANNE opens a new avenue in machine intelligence that ANN models will no longer perceived as tools to perform recognition tasks but as powerful tools to extract meaningful information embedded within the sea of high dimensional data.
Reference: Front Immunol. 2022.
ANNE source code are availalbe for public accessing Li lab GitHub.
MALANI
MALANI (Machine Learning-Assisted Network Inference) is a hybrid computational platform that harnesses the power of both machine learning and network biology methodologies to provide new insights and improve understanding of complex biological systems. MALANI assesses all genes regardless of expression or mutational status in the context of disease etiology by building more than 2 millions machine learning models for reconstructing gene regulatory networks. MALANI has the power to uncover "dark" disease genes that are neither mutated nor differentially expressed but play important pathological roles in disease development.
Reference: Sci Rep. 2017 Aug 01.
MALANI source code can be downloaded at Li lab GitHub.
Computational Drug Discoveries Platform, Machine Learning, Feature Selection, AI Drug Discoveries
AI and Machine learning methods and feature selection approaches for predicting specific Pharmacodynamic, Pharmacokinetic or Toxicological properties of pharmaceutical agents are useful for facilitating new drug discovery and development. Pharmaceutical agents have been developed and tested for possessing desirable pharmacodynamic, pharmacokinetic, and minimal level of toxicological properties. Computational methods have been explored for predicting these properties aimed at the discovery of promising leads and the elimination of unsuitable ones in early stages of drug development. AI and Machine learning methods have shown their huge potential for predicting these properties for structurally diverse sets of agents by using recently explored AI, mahcine learning and deep machine learning models. These methods have been used for predicting agents of a variety of pharmacodynamic, pharmacokinetic and toxicological properties
Reference: J Pharm Sci. 2007.; Drug Development Research. 2006.; J Mol Graph Model. 2006.; J Chem Inf Model. 2005.
PERMUTOR
PERsonalized MUtation evaluaTOR (PERMUTOR) is a novel computational pipeline which collects potent disease gene cooperative pathways to envision individualized disease etiology and therapies. Our algorithm constructs individualized disease networks and modules de novo which enable us to elucidate the importance of mutated genes in specific patients and to understand the synthetic penetrance of these genes across patients. Individualized module disruption enables us to devise customized singular and combinatorial target therapies which were highly varied across patients demonstrating the need for precision therapeutics pipelines. As the first analysis of de novo individualized disease networks and modules, we illustrate the power of individualized disease modules for precision medicine by providing deep novel insights on the activity of diseased genes in individuals.
Reference: Genome Res. 2021.
PERMUTOR source code are availalbe for public accessing Li lab GitHub.
GUM
Gene Utility Model (GUM) is a novel computational pipeline to understand the importance of genes under specific cellular contexts. GUM states that it is the utility of genes that provides selective pressure for the survival and fitness of aberrant cells. Using GUM, it is possible to construct an "utility karyotype" by mapping differentially utilized genes to their respective chromosomal loci. Further, GUM predicts that the resulting utility karyotype can recapitulate, to a certain extent, the chromosomal aberrancies observed in diseases.
Reference: Comput Struct Biotechnol J. 2022.
RSI
RSI (Regulostat Inferelator ) is a novel computational algorithm to decipher intrinsic molecular devices called regulostats that predetermine cellular phenotypic responses.
Reference: Nucleic Acids Res. 2019 May
RSI web interface and source code are availalbe at the RSI website portal Li lab GitHub.
Single-nucleus m6A-CUT&Tag (sn-m6A-CT) data analysis
sn-m6A-CT is for simultaneous profiling of m6A methylomes and transcriptomes within a single nucleus. sn-m6A-CT is capable of enriching m6A-marked RNA molecules in situ, without isolating RNAs from cells. sn-m6A-CT profiling is sufficient to determine cell identity and allows the generation of cell-type-specific m6A methylome landscapes from heterogeneous populations.
Reference: Mol Cell. 2023 Aug 25:S1097-2765(23)00649-4.
Source code are availalbe for public accessing sn-m6A-CT.
ASTAR-seq
ASTAR-Seq is an automated method with high sensitivity, assay for single-cell transcriptome and accessibility regions for simultaneous measurement of whole-cell transcriptome and chromatin accessibility within the same single cell.
Reference: Genome Research. 2020 July; Science Advances 2020 September.
Source code are availalbe for public accessing ASTAR-seq.
Multi-Regional-GBM-Imaging-and-Genetics
.
Reference: Nat Commun. 2023 Sep 28; 14 (1):6066.
Source code are availalbe for public accessing Li Lab.
EDDI
EDDI (Expression Dosage Dependent Inferelator) is a machine learning and systems biology approach to characterize dosage-based gene dependencies.
Reference: J Bioinform Syst Biol. 2021.
EDDI source code are availalbe for public accessing Li lab GitHub.
DPYD-Varifier
DPYD-Varifier (DPYD Gene-specific variant classifier) is a highly accurate in silico classifier to predict the functional impact of DPYD variants on DPD activity. DPYD-Varifier have great potential to systems pharmacology and individualize medicine and improve the clinical decision-making process.
Reference: Clin Pharmacol Ther. 2018 Jan 12.
P-Map
P-Map (Phenotype mapping) is a network-based phenotype mapping approach to identify genes and regularory networks that modulate drug response phenotypes.
Reference: Sci Rep. 2016 Nov 14.
P-Map source code can be downloaded at Li lab GitHub..
NetDecoder
NetDecoder is a network biology computational platform to dissect context-specific biological networks and gene activities. NetDecoder provides freely available source code and web portal resource for researchers to explore genome-wide context-dependent information flow profiles and key genes using pairwise phenotypic comparative analyses. NetDecoder also allows researchers to prioritize drug targets for genes that affect pathological contexts.
Reference: Nucleic Acids Res. 2016 Mar 14.
NetDecoder web interface and other materials are available at the website portal.
NetDecoder source code can be downloaded at Li lab GitHub.
For support of NetDecoder, please subscribe to our web forum.
CellNet
CellNet is a network biology-based computational platform that more accurately assesses the fidelity of cellular engineering than existing methodologies and generates hypotheses for improving cell derivations.
Reference: Cell. 2014 Aug 14;158(4):903-15.; Cell. 2014 Aug 14;158(4):889-902.
CellNet web interface and other materials are available at the website portal.
Modified RNA
Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA.
Reference: Cell Stem Cell. 2010.
StemSite
StemSite is a database of regulators network of the developmental origin of mouse hematopoietic stem cells.
Reference: Cell Stem Cell. 2012 Nov 2; 11(5):701-14.
StemSite Database is available here.
MNI
MNI (Mode-of-action by Network Inference) is a reverse engineering network biology algorithm to identify the gene targets and key mediators of a biomedical phenotype based on transcriptome data.
Reference: Nat Biotechnol. 2005 Mar;23(3):377-83.
Reference: Sci Transl Med. 2014 Jan 1;6(217):217ra2.
MNI source code can be downloaded here.
CLR
CLR (Context Likelihood of Relatedness) is an network biology algorithm to reverse-engineer and infer regulatory interactions between master regulators and their targets using a compendium of transcriptome profiles.
Reference: PLoS Biol 5(1): e8.
CLR source code can be downloaded here.
GEDI
GEDI (Gene Expression Dynamics Inspector), developed by Dr. Ingber's Lab, is a computational program that opens a new perspective to the analysis of transcriptome data. By treating each high-dimensional sample, such as one transcriptome experiment, as an object, it accentuates and visualize the genome-wide response of a tissue or a patient and treats it as an integrated biological entity. GEDI honors the new spirit of a system-level approach in biology and unites a novel holistic perspective with the traditional gene-centered approach in molecular biology.
Reference: Bioinformatics. 2003 Nov 22;19(17):2321-2.
GEDI source code can be downloaded here.
For general questions on GEDI source code, please contact Dr. Donald Ingber or Hu Li.
Pathway Modelling and Simulation
One of the most commonly used approaches to model biological systems is that of ODEs. In general, a differential equation can be used to describe the chemical reaction rate that depends on the change of participating species over time. The temporal dynamic behavior of molecular species in the biological signaling pathway network can be captured by a set of coupled ODEs.
Reference: Bioinformatics. 2009.; Cancer. 2009.; FEBS Lett. 2008.

Data and Software

LIFE

SPIN-AI

ANNE

MALANI

Computational Drug Discoveries Platform, Machine Learning, Feature Selection, AI Drug Discoveries

PERMUTOR

GUM

RSI

Single-nucleus m6A-CUT&Tag (sn-m6A-CT) data analysis

ASTAR-seq

Multi-Regional-GBM-Imaging-and-Genetics

EDDI

DPYD-Varifier

P-Map

NetDecoder

CellNet

Modified RNA

StemSite

MNI

CLR

GEDI

Pathway Modelling and Simulation