癌症基因组学、转录组学和蛋白质组学分析已经产生了大量的数据,需要开发工具进行分析和传播。我们开发了UALCAN,以提供一个门户,便于探索、分析和可视化这些数据,允许用户集成数据,以更好地了解癌症中受干扰的基因、蛋白质和通路,并做出发现。

UALCAN门户网站能够分析和提供癌症转录组、蛋白质组学和患者生存数据给癌症研究团体。利用从癌症基因组图谱(TCGA)项目获得的数据,UALCAN使用户能够评估蛋白质编码基因的表达及其对33种癌症患者生存的影响。该门户网站自发布以来得到了广泛的使用,并受到了极大的欢迎,100多个国家的癌症研究人员都在使用它。本手稿强调了自2017年发布以来我们对UALCAN所承担的任务和所做的更新。广泛的用户反馈促使我们通过包括以下数据来扩展资源:

a)来自TCGA的microRNAs (miRNAs)、长非编码RNA(lncRNAs)和启动子DNA甲基化;

b)来自临床蛋白质组学肿瘤分析联盟(CPTAC)的基于质谱的蛋白质组学。

UALCAN提供预先计算的、基于肿瘤亚群的基因/蛋白质表达、启动子DNA甲基化状态和Kaplan-Meier生存分析。还提供了新的可视化功能来理解和整合观察结果,并帮助生成用于测试的假设。

UALCAN提供蛋白质表达分析选项,使用来自临床蛋白质组肿瘤分析联盟(CPTAC)和国际癌症蛋白质基因组联盟(ICPC)的数据集。结直肠癌、乳腺癌、卵巢癌、透明细胞肾细胞癌、子宫内膜癌、胃癌、胶质母细胞瘤、小儿脑瘤、头颈部鳞状细胞癌、肺腺癌、肺鳞状细胞癌、肝癌、胰腺癌、前列腺癌等均有蛋白表达。

机器学习组合多个数据集 多组学数据挖掘_sed

机器学习组合多个数据集 多组学数据挖掘_sed_02

1. How to obtain list of top differentally expressed genes?

Step 1: Go to analysis page of UALCAN and choose cancer of interest from left panel by clicking it.

机器学习组合多个数据集 多组学数据挖掘_数据_03

Step 2: UALCAN lists top 250 over-/under-expressed genes in cancer of interest compared to normal samples.

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_04

Step 3: User can analyze gene expression and survival profiles of individual gene by clicking the on the heat map.

机器学习组合多个数据集 多组学数据挖掘_数据_05

Step 4: UALCAN also enables user to view top 250 over-/under-expressed genes in major cancer subtypes compared to normal samples.

机器学习组合多个数据集 多组学数据挖掘_数据_06

2. How to query UALCAN for gene(s) of interest in specific cancer?

Step 1: Go to analysis page of UALCAN and enter offical symbol of gene(s) in the text area

机器学习组合多个数据集 多组学数据挖掘_Go_07

Step 2: Choose the TCGA data set of interest from the drop down menu and click "Explore" button to submit.

机器学习组合多个数据集 多组学数据挖掘_数据_08

Step 3: Output page provides links to analysis results and external database links.

机器学习组合多个数据集 多组学数据挖掘_数据_09

a. How to explore expression profile of gene of interest based on clinico-pathologic factors?

Obtaining EZH2 expression profile in breast invasive carcinoma dataset based on patient’s race.

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_10

机器学习组合多个数据集 多组学数据挖掘_数据_11

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_12

机器学习组合多个数据集 多组学数据挖掘_Go_13

机器学习组合多个数据集 多组学数据挖掘_sed_14

Downloading the box plot from UALCAN

机器学习组合多个数据集 多组学数据挖掘_Go_15

b. How to obtain Kaplan meier plot from UALCAN?

Analyzing survival plots of EZH2 in breast invasive carcinoma TCGA dataset

机器学习组合多个数据集 多组学数据挖掘_sed_16

c. How to obtain promoter DNA methylation level from UALCAN?

Analyzing promoter DNA methylation level of THSD1 in breast invasive carcinoma TCGA dataset

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_17

机器学习组合多个数据集 多组学数据挖掘_Go_18

d. How to obtain list of positive and negatively correlated genes from UALCAN?

Obtaining list of positively/negatively correlated genes of EZH2 in breast invasive carcinoma TCGA dataset

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_19

3. How to obtain expression and survival profile for pre-compiled gene classes?

Step 1: Go to analysis page of UALCAN

机器学习组合多个数据集 多组学数据挖掘_机器学习组合多个数据集_20

Step 2: Query UALCAN for pre-compiled gene classes

机器学习组合多个数据集 多组学数据挖掘_Go_21

Step 3: Select gene class of interest (e.g. Kinase coding genes) and click "Explore" to see results

机器学习组合多个数据集 多组学数据挖掘_数据_22

Step 4: UALCAN lists all kinase coding genes and indicates their expression status and survival effect in different cancers.

机器学习组合多个数据集 多组学数据挖掘_Go_23

机器学习组合多个数据集 多组学数据挖掘_sed_24

Cancer with red border show up regulation of gene

Step 5: User can click on any of these link to visualize expression and survival profiles

机器学习组合多个数据集 多组学数据挖掘_sed_25

Introduction to Cancer terms

Cancer

TCGA code

Adrenocortical carcinoma

ACC

Bladder urothelial carcinoma

BLCA

Brain lower grade glioma

LGG

Breast invasive carcinoma

BRCA

Cervical squamous cell carcinoma

CESC

Cholangiocarcinoma

CHOL

Colon adenocarcinoma

COAD

Esophageal carcinoma

ESCA

Glioblastoma multiforme

GBM

Head and Neck squamous cell carcinoma

HNSC

Kidney Choromophobe

KICH

Kidney renal clear cell carcinoma

KIRC

Kidney renal papillary cell carcinoma

KIRP

Liver hepatocellular carcinoma

LIHC

Lung adenocarcinoma

LUAD

Lung squamous cell carcinoma

LUSC

Lymphoid Neoplasm Diffuse Large B-cell Lymphoma

DBLC

Mesothelioma

MESO

Ovarian serous cystadenocarcinoma

OV

Pancreatic adenocarcinoma

PAAD

Pheochromocytoma and Paraganglioma

PCPG

Prostate adenocarcinoma

PRAD

Rectum adenocarcinoma

READ

Sarcoma

SARC

Skin Cutaneous Melanoma

SKCM

Testicular Germ Cell Tumors

TGCT

Thymoma

THYM

Thyroid carcinoma

THCA

Uterine Carcinosarcoma

UCS

Uterine Corpus Endometrial Carcinoma

UCEC

Uveal Melanoma

UVM

References:

  1. Zhang, Y., Chen, F., Chandrashekar, D.S., Varambally, S., Creighton, C.J. Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways. Nat Commun 13, 2669 (2022) doi: 10.1038/s41467-022-30342-3
  2. Chen, F., Chandrashekar, D.S., Varambally, S., Creighton, C.J. Pan-cancer molecular subtypes revealed by mass-spectrometry-based proteomic characterization of more than 500 human cancers. Nat Commun 10, 5679 (2019) doi:10.1038/s41467-019-13528-0
  3. Chandrashekar DS, Karthikeyan SK, Korla PK, Patel H, Shovon AR, Athar M, Netto GJ, Qin ZS, Kumar S, Manne U, Creighton CJ, Varambally S. UALCAN: An update to the integrated cancer data analysis platform. Neoplasia. 2022 Mar;25:18-27. doi: 10.1016/j.neo.2022.01.001  

使用起来还是非常方便,避免了自己写代码,又找数据又作图,有需要的老师可以参考使用!