您好,登錄后才能下訂單哦!
這篇文章主要介紹“R包GSVA富集分析的方法”的相關(guān)知識(shí),小編通過(guò)實(shí)際案例向大家展示操作過(guò)程,操作方法簡(jiǎn)單快捷,實(shí)用性強(qiáng),希望這篇“R包GSVA富集分析的方法”文章能幫助大家解決問(wèn)題。
Rscript ../scripts/ssgsea_enrich_diff.r -h usage: ../scripts/ssgsea_enrich_diff.r [-h] -g GMTFILE -i EXPR -m META [-n GROUP_NAME] [--log2] [-t method] [-k kcdf] [--group1 GROUP1] [--group2 GROUP2] [-p PVALUECUTOFF] [--no_diff] [-o OUTDIR] [-f PREFIX] Gene set variation analysis (GSVA):https://www.億速云.com/article/1586 optional arguments: -h, --help show this help message and exit -g GMTFILE, --gmtfile GMTFILE GSEA gmtfile function class file[required] -i EXPR, --expr EXPR Input gene expression file path[required] -m META, --meta META Input the clinical information file path that contains the grouping[required] -n GROUP_NAME, --group_name GROUP_NAME Specifies the column name that contains grouping information[optional,default:m6acluster] --log2 Whether to perform log2 processing[optional,default:False] -t method, --method method Method to employ in the estimation of gene-set enrichment scores per sample. By default this is set to gsva (H?nzelmann et al, 2013) and other options are ssgsea (Barbie et al, 2009), zscore (Lee et al, 2008) or plage (Tomfohr et al, 2005). The latter two standardize first expression profiles into z-scores over the samples and, in the case of zscore, it combines them together as their sum divided by the square-root of the size of the gene set, while in the case of plage they are used to calculate the singular value decomposition (SVD) over the genes in the gene set and use the coefficients of the first right- singular vector as pathway activity profile[default gsva] -k kcdf, --kcdf kcdf Character string denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples when method="gsva". By default, kcdf="Gaussian" which is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log- CPMs, log-RPKMs or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to kcdf="Poisson"[default Gaussian] --group1 GROUP1 Designate the first group[optional,default C1] --group2 GROUP2 Designate the second group[optional,default C2] -p PVALUECUTOFF, --pvalueCutoff PVALUECUTOFF pvalue cutoff on enrichment tests to report[optional,default:0.05] --no_diff No screening was performed based on the difference analysis results[optional,default:False] -o OUTDIR, --outdir OUTDIR output file directory[optional,default cwd] -f PREFIX, --prefix PREFIX out file name prefix[optional,default kegg]
-g 參考基因集,如從MSigDB下載的KEGG基因集c2.cp.kegg.v6.2.symbols.gmt
KEGG_GLYCOLYSIS_GLUCONEOGENESIS | http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_GLYCOLYSIS_GLUCONEOGENESIS | ACSS2 | GCK |
KEGG_CITRATE_CYCLE_TCA_CYCLE | http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_CITRATE_CYCLE_TCA_CYCLE | IDH3B | DLST |
KEGG_PENTOSE_PHOSPHATE_PATHWAY | http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_PENTOSE_PHOSPHATE_PATHWAY | RPE | RPIA |
-i 基因表達(dá)矩陣
ID | TCGA-A3-3319-01A-02R-1325-07 | TCGA-A3-3323-01A-02R-1325-07 |
YTHDC2 | 16.5128725081007 | 20.6535652352011 |
ELAVL1 | 44.3876796198438 | 31.8729000784291 |
-m 包含樣本分組信息的樣本信息文件
barcode | patient | sample |
TCGA-BP-4766-01A-01R-1289-07 | TCGA-BP-4766 | TCGA-BP-4766-01A |
TCGA-A3-3352-01A-01R-0864-07 | TCGA-A3-3352 | TCGA-A3-3352-01A |
--log2 是否對(duì)基因表達(dá)矩陣進(jìn)行l(wèi)og2轉(zhuǎn)換
-n 指定樣本信息文件中分組信息的列名
--group1 --group2 指定分組組名
-t 指定用于估計(jì)基因集的方法,默認(rèn)為gsva
-k 指定gsva函數(shù)中的kcdf參數(shù),使用read count數(shù)據(jù)時(shí)一般設(shè)為“Poisson”,使用log后的TPM等數(shù)據(jù)時(shí)一般就用默認(rèn)值“Gaussian”
-p 指定p的閾值
--no_diff GSVA會(huì)先將表達(dá)矩陣轉(zhuǎn)換成富集分?jǐn)?shù)矩陣,然后再通過(guò)差異表達(dá)分析篩選富集結(jié)果,設(shè)置這個(gè)參數(shù)則不對(duì)富集結(jié)果進(jìn)行篩選
Rscript ../scripts/ssgsea_enrich_diff.r -g enrich/c2.cp.kegg.v6.2.symbols.gmt \ -i ../02.sample_select/TCGA-KIRC_gene_expression_TPM_immu.tsv -m metadata_group.tsv \ -n m6acluster --log2 --group1 C1 --group2 C2 -p 0.05 -o enrich/C1_vs_C2
關(guān)于“R包GSVA富集分析的方法”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí),可以關(guān)注億速云行業(yè)資訊頻道,小編每天都會(huì)為大家更新不同的知識(shí)點(diǎn)。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。