1 项目概述

项目编号GDD20090230-3_sup_2
样品信息F-3¦F-4¦F-5¦F-6¦G-2¦G-4¦G-5¦G-6
分组方案F :F-3&F-4&F-5&F-6¦G :G-2&G-4&G-5&G-6
Welch's t testF-vs-G
ANOVAF-vs-G
MetastatsF-vs-G
LEfSeF-vs-G
序列组装按分组, k-mer 21-141
物种注释基于 reads

广州基迪奥生物科技有限公司




2 技术介绍

宏基因组测序是一种利用高通量测序技术完成微生物群落所有物种基因组的检测和功能分析的方法。 宏基因组测序技术无需微生物的分离纯化培养,能够快速有效地获得整个微生物群落的基因信息,可以更加深入地对群落结构、物种分类、系统进化、基因功能及代谢网络等方面进行研究。此类方法不需要对微生物进行分离纯化培养,尤其适用于环境微生物、肠道微生物等样本的研究。

2.1 实验流程
Fig 2-1-1 实验流程图

2.2 分析流程
Fig 2-2-1 信息分析流程图

广州基迪奥生物科技有限公司




3 数据处理

3.1 数据过滤

测序完成之后获得原始数据(raw data)在正常情况下会存在一部分低质量数据,这些低质量数据会影响后续分析结果的准确性。因此我们会根据一定的标准过滤掉低质量数据,从而获得用于后续准确分析的高质量数据(clean data)。
数据过滤的标准如下:

Tab 3-1-1 数据过滤统计表
SampleRawreadsCleanreads(%)Adapter(%)LowQuality(%)polyA(%)N(%)
F-36866309068526930 (99.80%)30706 (0.04%)207064 (0.15%)0 (0.0%)3844 (0.0%)
F-47215506272003126 (99.79%)31220 (0.04%)231280 (0.16%)0 (0.0%)10152 (0.01%)
F-56887622268756650 (99.83%)22020 (0.03%)191168 (0.14%)0 (0.0%)3936 (0.0%)
F-67043055470294040 (99.81%)35404 (0.05%)192124 (0.14%)0 (0.0%)10096 (0.01%)
G-27066565470529454 (99.81%)29390 (0.04%)200644 (0.14%)0 (0.0%)12976 (0.01%)
G-46563128265497078 (99.80%)25712 (0.04%)213268 (0.16%)0 (0.0%)3716 (0.0%)
G-56665432066466480 (99.72%)32438 (0.05%)307528 (0.23%)0 (0.0%)3276 (0.0%)
G-67027689070103678 (99.75%)39056 (0.06%)257332 (0.18%)0 (0.0%)10980 (0.01%)

Fig 3-1-1 数据预处理分布图(百分比) Fig 3-1-2 数据预处理分布图(数值)

3.2 碱基质量分析

数据经过过滤后,我们将分析碱基的组成及质量分布,以直观展示数据质量情况。碱基组成越平衡,质量越高,后续分析则越准确。

Tab 3-2-1 过滤前后碱基信息统计表
SampleRawData(bp)BF_Q20(%)BF_Q30(%)BF_N(%)BF_GC(%)CleanData(bp)AF_Q20(%)AF_Q30(%)AF_N(%)AF_GC(%)
F-3102994635009993944910 (97.03%)9465326967 (91.9%)193188 (0.0%)5216314828 (50.64%)102619772619964697107 (97.1%)9439740968 (91.99%)164746 (0.0%)5195899500 (50.64%)
F-41082325930010468860149 (96.73%)9878770979 (91.27%)323304 (0.0%)5410541082 (49.99%)1079113242810445617444 (96.8%)9859139775 (91.36%)172924 (0.0%)5393133246 (49.98%)
F-5103314333009995149257 (96.75%)9429336703 (91.27%)195341 (0.0%)5320552875 (51.5%)103016168819972739741 (96.81%)9410164154 (91.35%)166477 (0.0%)5304212894 (51.49%)
F-61056458310010230573448 (96.84%)9666715357 (91.5%)323923 (0.0%)5151257722 (48.76%)1053219501910207147983 (96.91%)9646949221 (91.59%)172239 (0.0%)5134097258 (48.74%)
G-21059984810010302104514 (97.19%)9776550923 (92.23%)380456 (0.0%)5446545493 (51.38%)1056641421010276029120 (97.25%)9753679888 (92.31%)172419 (0.0%)5428627622 (51.38%)
G-498446923009505992364 (96.56%)8953934569 (90.95%)186818 (0.0%)5066197778 (51.46%)98122330579481993083 (96.63%)8933606112 (91.05%)159546 (0.0%)5048577787 (51.45%)
G-599981480009654188365 (96.56%)9095462182 (90.97%)180539 (0.0%)5127872541 (51.29%)99642049209630488454 (96.65%)9075754142 (91.08%)155924 (0.0%)5109396306 (51.28%)
G-61054153350010241841278 (97.16%)9718340579 (92.19%)338064 (0.0%)5416351405 (51.38%)1049236644310202518463 (97.24%)9683481247 (92.29%)170359 (0.0%)5389651322 (51.37%)

  • F-3
  • F-4
  • F-5
  • F-6
  • G-2
  • G-4
  • G-5
  • G-6

Fig 3-2-1 各样品过滤前后碱基组成分布图


3.3 宿主序列过滤

如果样本采集来源于肠道等微生物,则不可避免的会存在宿主序列污染。如果宿主参考基因组已经发布,我们会将过滤后数据用 Bowtie2 比对到宿主参考基因组,过滤来源于宿主的reads,得到effective reads进行后续分析。
备注:此分析只在样本存在宿主参考的情况下才会开展。

Tab 3-3-1 宿主序列过滤统计表
SamplesF-3F-4F-5F-6G-2G-4G-5G-6
Sum Clean Reads6852693072003126687566507029404070529454654970786646648070103678
Sum Host mapped Reads8935511029797751111396929417877774682104773

Fig 3-3-1 宿主序列过滤统计

广州基迪奥生物科技有限公司




4 序列组装

利用 MEGAHIT 软件对effective reads进行组装
结果在文件夹:02.Assemble

Tab 4-0-1 各样本组装统计结果
SampleContigs NumTotal lengthAverage lengthMax lengthN50N90
F 1361185 2184597799 1604.92 619133 2326 645
G 1576316 2206043563 1399.49 331676 1748 616

  • 样品 F contig 长度分布图
  • 样品 G contig 长度分布图

Fig 4-0-1 各样本contig长度分布图


广州基迪奥生物科技有限公司




5 基因预测

5.1 基因预测

利用 MetaGeneMark 对>500bp的contigs进行基因预测,然后采用 CD-HIT 软件(95% identity、90% coverage)对所预测基因进行聚类,选取最长的基因作为每类代表序列,构建初始非冗余基因集合。

结果在文件夹:03.Genes

Tab 5-1-1 非冗余基因集样本基因数目统计表
SampleGeneNumberTotalLengthAverageLengthGC%
F-3461162515203104111753.79%
F-4537898580964919108053.63%
F-5400314460464543115053.70%
F-6395609452996859114553.30%
G-2388715427116321109854.20%
G-4293558353490087120454.12%
G-5415056460258407110854.58%
G-6567078576597696101654.10%

使用柱形图展示每个样本的基因数目分布,便于比较样本间基因数量差异。

Fig 5-1-1 各样本基因数目分布

使用小提琴图可直观反映组内样本基因数目分布和组间基因数目的差异。

Fig 5-1-2 分组基因数分布小提琴图

5.2 基因丰度统计

利用 bowtie2 将 clean reads重新比对到初始非冗余基因集上,并基于比对结果,使用pathoscope软件重新将reads分配给最佳基因。(备注:Bowtie比对时可能出现一条reads比对上多个基因,而pathoscope可以基于算法将这些多重比对的reads分配给最佳基因,优化基因定量结果,也是文献认可推荐的方法)。过滤掉在各个样品中reads支持数目≤2的基因,获得最终用于后续分析的基因集合。
从分配给基因的reads 数目、基因长度、测序深度出发,计算得到各基因在各样品中的相对丰度信息。前10行示例如下:


Tab 5-2-1 各样本基因丰度表
GeneIDF-3_countF-4_countF-5_countF-6_countG-2_countG-4_countG-5_countG-6_countF-3_RelativeAbundanceF-4_RelativeAbundanceF-5_RelativeAbundanceF-6_RelativeAbundanceG-2_RelativeAbundanceG-4_RelativeAbundanceG-5_RelativeAbundanceG-6_RelativeAbundance
Unigene113.000000009.038851272516904e-070000000
Unigene40003.000016.00002.527775465885085e-070001.5335500503915718e-06
Unigene13000283.000000001.4873831280945607e-050000
Unigene1418.007.0000008.871162176510935e-0703.363301974977882e-0700000
Unigene16003.04.0030.000001.8493628110390513e-072.4037713864265717e-0702.967263527620739e-0600
Unigene1711.000000008.155886563458923e-070000000
Unigene18015.000000001.5778178350403477e-06000000
Unigene1900028.000000003.1181789663085946e-060000
Unigene2000016.00004.00001.6545439413066012e-060004.705210381883232e-07
Unigene220000000000000000


结果在文件夹:03.Genes

广州基迪奥生物科技有限公司




6 基因功能注释

获得非冗余基因集后,我们基于各种数据库比对注释结果分析预测样本中微生物群落的功能特征。将Unigenes通过DIAMOND软件(阈值evalue<=1e-5)比对到KEGG、eggNOG、CAZy、CARD、VFDB、PHI等多个数据库,同时集合基因丰度表格计算不同数据库比对结果的丰度信息,以进行系统丰富的组间功能差异分析和比较。
结果在文件夹:04.Annotation

Tab 6-0-1 数据库注释统计表
DatabaseGeneCountGenePercent(%)
KEGG163307971.92%
eggNOG167653873.83%
CAZy27155011.96%
VFDB2034518.96%
PHI23173610.20%
CARD636102.80%

6.1 KEGG功能注释

KEGG,全称Kyoto Encyclopedia of Genes and Genomes,是一个关于基因功能注释方面的综合性数据库,包括基因的功能、分类、代谢通路(KEGG Pathway数据库, 是KEGG最核心的功能注释数据库)等诸多方面的信息。
KEGG Pathway数据库将生物代谢通路划分为7大类(A级分类,level 1),分别为:新陈代谢(Metabolism)、遗传信息处理(Genetic Information Processing)、 环境信息处理(Environmental Information Processing)、细胞过程(Cellular Processes)、生物体系统(Organismal Systems)、 人类疾病(Human Diseases)、药物相关代谢(Drug developmennt)。每大类又被逐步细分更具体的B、C、D3个层级。目前B级分类(level 2)共有 59个子分类。C级分类(level 3)即为代谢通路图(pathway,map);D级分类为每个代谢通路图中具体的酶、同源基因、化合物等信息。
我们基于KEGG的层级分类注释结果,可以获得样本/分组间不同深度的功能特征,还可以基于pathway注释信息系统了解群落内基因间的潜在关联、代谢网络等。
Pathway注释信息展示如下:

Tab 6-1-1 Pathway 注释信息
KEGG_A_classKEGG_B_classPathwayCount (453974)Pathway ID...
MetabolismNucleotide metabolismPurine metabolism37930ko00230...
Environmental Information ProcessingMembrane transportABC transporters33855ko02010...
MetabolismNucleotide metabolismPyrimidine metabolism33616ko00240...
Environmental Information ProcessingSignal transductionTwo-component system27270ko02020...
Cellular ProcessesCellular community - prokaryotesQuorum sensing24892ko02024...
MetabolismCarbohydrate metabolismAmino sugar and nucleotide sugar metabolism24328ko00520...
MetabolismCarbohydrate metabolismStarch and sucrose metabolism23047ko00500...
Genetic Information ProcessingReplication and repairHomologous recombination20545ko03440...
Genetic Information ProcessingTranslationAminoacyl-tRNA biosynthesis20316ko00970...
Genetic Information ProcessingReplication and repairMismatch repair17826ko03430...

我们统计所有样本中注释到pathway数据库的基因总数,并绘制条形图,直观展示不同分类层级群落的基因数量分布情况

Fig 6-1-1 所有样本中pathway注释到的基因数目

我们用热图展示各样本中KEGG不同层级的功能丰度特征,直观呈现每个样本的功能丰度信息,以初步呈现样本/分组间功能分布规律。并挑选pathway(所有样本中丰度之和排名前25的pathway)丰度特征展示如下。

Tab 6-1-2 Pathway热图数据
F-3F-4F-5F-6G-2G-4G-5G-6
Purine metabolism0.01812470139461750.01734396348661310.01557842360357650.01681137492681610.01707302718032610.01742832692566380.01787973983743330.0170129829617836
ABC transporters0.02168791860965650.0177190303335730.01397506421089090.01520782890535350.01475219458078930.01643341313597090.02096035134710510.0128013425517915
Pyrimidine metabolism0.01565499716886030.01524056044412940.01421563375507660.01413176982044580.01474265045989970.01533859206879240.01602297135268450.0149757073078571
Two-component system0.01418348152887190.01355964107022690.01265812938179350.01394945682473990.01484693936614820.01456523795870450.01280259222619190.0139700568239606
Quorum sensing0.01480305638415260.01239416265351270.01087043653178020.01149043662195960.01097876895338880.0129757026070630.01514680577737450.00964859975212261
Starch and sucrose metabolism0.01131808280679740.01095293374041880.009490912076606680.008997680964830180.01281985321118780.0125928845177250.01145819184588010.00982237296609164
Amino sugar and nucleotide sugar metabolism0.01158099788302780.01041931728229810.009786339998173930.01093612136552750.01075109429102940.01091851872840850.01080926711179150.0098602318609377
Homologous recombination0.01004112453007850.01149454714026510.008981176419811120.009018718799375470.008697688088763490.009206890215942260.01024293208845490.00887958423605778
Aminoacyl-tRNA biosynthesis0.009507650769552590.009435224906672510.008090351329674980.008491566945532130.008058167561436550.008764072203058320.009046117626574610.00854373607515917
Ribosome0.01008398203651260.01063627825614630.00744628897471730.008347415704687660.007758143027497790.00574394165699130.008778845638492090.00743489280176606

Fig 6-1-2 Top 25 pathway Heatmap

6.2 eggNOG功能注释

eggNOG(evolutionary genealogy of genes: Non-supervised Orthologous Groups)数据库是利用 Smith-Waterman 比对算法构建的基因直系同源簇 (Orthologous Groups,Ogs),当前最新版本5.0(2019.01),涵盖了5090个物种(4445个代表性细菌、168个古菌、477个真核生物)、2502个病毒,首先基于所有基因组蛋白信息,聚类获得约4.4M的OGs(C层级),然后基于KEGG等各类数据库进行注释,最终构建成25个功能分类(A层级)。
使用eggNOG数据库基于直系同源(orthology)进行基因、蛋白序列的功能预测被认为比传统的同源搜索更准确,常应用于新基因组、宏基因组等基因集。我们使用Diamond进行eggNOG注释,并以柱形图展示25个功能分类的基因数目统计情况,便于对比群落中各功能分类的基因数量分布。

Tab 6-2-1 eggNOG注释详细信息
GeneIDeggNOG_OGBest_DescriptionBest_COG_CatF-3F-4F-5F-6G-2G-4G-5G-6
Unigene1COG0345;1TP1E;27JFY;247SRCatalyzes the reduction of 1-pyrroline-5-carboxylate (PCA) to L-prolineE9.038851272516904e-070000000
Unigene13COG0577;2FPEN;4NDUK;22X01ABC transporter permeaseV0001.4873831280945607e-050000
Unigene142J5C6;COG0527;COG0460Amino acid kinase familyE8.871162176510935e-0703.363301974977882e-0700000
Unigene16COG2217Heavy metal translocating P-type atpaseP001.8493628110390513e-072.4037713864265717e-0702.967263527620739e-0600
Unigene174E7E2;2GQPQ;COG0681Peptidase S24-likeU8.155886563458923e-070000000
Unigene191TNYD;247QZ;COG1126;4BXFRATPases associated with a variety of cellular activitiesE0003.1181789663085946e-060000
Unigene20COG3152MembraneL0001.6545439413066012e-060004.705210381883232e-07
Unigene221ZBDP;COG0557;1TQ1G;4HBBH3'-5' exoribonuclease that releases 5'-nucleoside monophosphates and is involved in maturation of structured RNAsK00000000
Unigene3733MU2;24W74;1VQ3S;2DTVE--08.560410042685029e-07000000
Unigene44247XY;27J0H;COG0090;1TP9XOne of the primary rRNA binding proteins. Required for association of the 30S and 50S subunits to form the 70S ribosome, for tRNA binding and peptide bond formation. It has been suggested to have peptidyltransferase activityJ00000000

Fig 6-2-1 eggNOG注释统计图

热图展示总丰度排名前25的Ogs在所有样本中的丰度分布特征,便于初步查看各样本/分组功能分布规律。

Fig 6-2-2 Top 25 eggNOG Family Heatmap

6.3 CAZy功能注释

CAZy 全称Carbohydrate-Active enZYmes Database,是碳水化合物酶相关的专业数据库,包括催化碳水化合物生物合成、降解以及修饰的相关酶系家族,对生物质转化等工业微生物领域、生境内碳循环相关功能研究有重要参考价值。
其包含6个主要分类(level A):糖苷水解酶(Glycoside Hydrolases, GHs)、糖基转移酶(Glycosyl Transferases, GTs)、多糖裂解酶(Polysaccharide Lyases, PLs)和糖类酯解酶(Carbohydrate Esterases, CEs)、辅助氧化还原酶(Auxiliary Activities , AAs)。此外,还包含与碳水化合物相关的modules(Carbohydrate-Binding Modules,CBMs)。其中每一个大类有可以分类很多小的家族(level B),比如CE1,CE2等等,注释结果中的CE0表示没有小家族分类的结果。
使用 Diamond 将基因序列与数据库比对注释。

Tab 6-3-1 CAZy注释详细信息
LevelALevelA_full_nameLevelBActivitiesCountGenes
AAAuxiliary ActivitiesAA0-171Unigene2417013;Unigene4778820;Unigene246502;Unigene833634;Unigene142581;Unigene916912;Unigene3839417;Unigene317618;Unigene4380675;Unigene3067376;Unigene328007;Unigene368754;Unigene691405;Unigene3059373;Unigene4897004;Unigene4894757;Unigene606610;Unigene2468769;Unigene368637;Unigene115203;Unigene1828892;Unigene4209982;Unigene2247671;Unigene1564168;Unigene2820379;Unigene4403997;Unigene2087114;Unigene2034919;Unigene2490462;Unigene725027;Unigene1596940;Unigene3984534;Unigene3301920;Unigene1326868;Unigene3502815;Unigene3567017;Unigene2614586;Unigene1972655;Unigene2070955;Unigene2371071;Unigene4796683;Unigene2806240;Unigene3000077;Unigene134863;Unigene402242;Unigene1316062;Unigene2443483;Unigene843210;Unigene3958112;Unigene143576;Unigene745049;Unigene2650200;Unigene217947;Unigene3306214;Unigene3361296;Unigene2857693;Unigene3611334;Unigene3642107;Unigene1094225;Unigene2700547;Unigene306101;Unigene3019758;Unigene1393796;Unigene1458813;Unigene620060;Unigene1711777;Unigene3638039;Unigene1308691;Unigene1736090;Unigene3243624;Unigene1801610;Unigene4922574;Unigene1421439;Unigene568012;Unigene1168374;Unigene3589332;Unigene2272739;Unigene3389935;Unigene1467990;Unigene4646187;Unigene2206278;Unigene110666;Unigene4753258;Unigene1759591;Unigene682281;Unigene852983;Unigene4944682;Unigene4762264;Unigene1419572;Unigene3638242;Unigene478735;Unigene55173;Unigene4816488;Unigene4090779;Unigene1108793;Unigene210682;Unigene945538;Unigene2842192;Unigene502881;Unigene844310;Unigene2317398;Unigene2761303;Unigene1973477;Unigene67928;Unigene2047526;Unigene4333266;Unigene3034547;Unigene3844637;Unigene2613560;Unigene155801;Unigene1134610;Unigene211071;Unigene4402777;Unigene1715536;Unigene3957792;Unigene4904180;Unigene3424598;Unigene3854360;Unigene4069762;Unigene1235721;Unigene1084220;Unigene4146426;Unigene1579755;Unigene237597;Unigene1711780;Unigene805712;Unigene2975806;Unigene1273474;Unigene4398328;Unigene655064;Unigene3832025;Unigene4803528;Unigene160764;Unigene4288945;Unigene1719210;Unigene49538;Unigene1777645;Unigene972542;Unigene3842619;Unigene101965;Unigene354799;Unigene1737327;Unigene4646331;Unigene546618;Unigene1923986;Unigene3952161;Unigene4730928;Unigene4625935;Unigene3268823;Unigene1552035;Unigene4843696;Unigene4832689;Unigene2073225;Unigene2138771;Unigene3886949;Unigene1899510;Unigene1950996;Unigene1222079;Unigene633048;Unigene4506458;Unigene1905972;Unigene1003257;Unigene4099943;Unigene1480089;Unigene3592139;Unigene1629245;Unigene2529922;Unigene3511261;Unigene3256265;Unigene2244553;Unigene4009537
AAAuxiliary ActivitiesAA1Laccase / p-diphenol:oxygen oxidoreductase / ferroxidase (EC 1.10.3.2); ; ferroxidase (EC 1.10.3.-); Laccase-like multicopper oxidase (EC 1.10.3.-)1340Unigene4494904;Unigene2650398;Unigene109960;Unigene1595068;Unigene3387598;Unigene217586;Unigene2379411;Unigene1460983;Unigene2399477;Unigene1911754;Unigene2635994;Unigene387835;Unigene3477056;Unigene2844139;Unigene2832383;Unigene4304991;Unigene4501005;Unigene253985;Unigene1485511;Unigene3336199;Unigene4748107;Unigene1463220;Unigene3480967;Unigene2355287;Unigene3875581;Unigene2434751;Unigene4245199;Unigene4694710;Unigene3551672;Unigene537855;Unigene290631;Unigene4130369;Unigene4299887;Unigene1666324;Unigene362023;Unigene2878761;Unigene4021706;Unigene4329;Unigene2661212;Unigene4402177;Unigene3756402;Unigene3012241;Unigene2251939;Unigene4735655;Unigene665704;Unigene630508;Unigene3744199;Unigene1420447;Unigene1240468;Unigene3693717;Unigene1513208;Unigene3516185;Unigene946701;Unigene2717828;Unigene71417;Unigene3566059;Unigene1090300;Unigene1096005;Unigene3128241;Unigene424910;Unigene2919977;Unigene3095034;Unigene2088920;Unigene2941628;Unigene918055;Unigene28869;Unigene103815;Unigene2114584;Unigene1859574;Unigene820788;Unigene844105;Unigene3627710;Unigene3024214;Unigene4757114;Unigene1170715;Unigene3546555;Unigene206537;Unigene3607501;Unigene4440215;Unigene4339078;Unigene4753594;Unigene3328575;Unigene64371;Unigene1040502;Unigene3024784;Unigene2108827;Unigene4508782;Unigene3376344;Unigene17262;Unigene43688;Unigene1407532;Unigene3075855;Unigene605246;Unigene4167177;Unigene3177563;Unigene1185303;Unigene4356617;Unigene1640118;Unigene4583837;Unigene3739161;Unigene693562;Unigene1865518;Unigene2000996;Unigene4281161;Unigene1022087;Unigene345851;Unigene1319312;Unigene43964;Unigene1077501;Unigene2733004;Unigene3520293;Unigene1833160;Unigene818192;Unigene3602451;Unigene3204235;Unigene527734;Unigene2908343;Unigene4084609;Unigene1299267;Unigene420865;Unigene3771942;Unigene1719151;Unigene2997691;Unigene2644273;Unigene3912460;Unigene2077689;Unigene4867915;Unigene1715527;Unigene2891130;Unigene165010;Unigene2244776;Unigene916966;Unigene365348;Unigene2169862;Unigene3450392;Unigene2374365;Unigene1688345;Unigene4187345;Unigene1074436;Unigene4325804;Unigene4386949;Unigene224901;Unigene2681722;Unigene2101431;Unigene1969778;Unigene1422248;Unigene1141211;Unigene2567;Unigene917747;Unigene2719602;Unigene3484943;Unigene3542843;Unigene2393396;Unigene4696534;Unigene3873544;Unigene1644328;Unigene1000235;Unigene4587059;Unigene144260;Unigene462427;Unigene4905580;Unigene2575532;Unigene4735510;Unigene951591;Unigene3158591;Unigene1417771;Unigene414140;Unigene718315;Unigene1681641;Unigene3857042;Unigene1438598;Unigene4203831;Unigene4078539;Unigene425913;Unigene1465848;Unigene2268451;Unigene2397087;Unigene611685;Unigene1801693;Unigene219604;Unigene4093384;Unigene98047;Unigene1716438;Unigene4581348;Unigene3865184;Unigene2357716;Unigene3938577;Unigene1882215;Unigene2126165;Unigene4298537;Unigene856284;Unigene36001;Unigene923552;Unigene3076166;Unigene2024005;Unigene1495644;Unigene1947136;Unigene1287387;Unigene3019898;Unigene1755033;Unigene614395;Unigene4713268;Unigene2820117;Unigene3636644;Unigene4524447;Unigene4361858;Unigene4696217;Unigene451891;Unigene4298660;Unigene4860877;Unigene651140;Unigene1255863;Unigene2712810;Unigene714021;Unigene3845994;Unigene3131592;Unigene2518195;Unigene2762224;Unigene1022367;Unigene1751303;Unigene3349790;Unigene162163;Unigene3772048;Unigene1251860;Unigene504918;Unigene401732;Unigene3844653;Unigene3698682;Unigene3823752;Unigene2133776;Unigene1504541;Unigene887552;Unigene4676684;Unigene3493152;Unigene2639631;Unigene804994;Unigene1094811;Unigene424265;Unigene2236702;Unigene750014;Unigene1032916;Unigene2681268;Unigene4320963;Unigene1610154;Unigene162102;Unigene251582;Unigene941689;Unigene1389636;Unigene1311935;Unigene2011318;Unigene3331311;Unigene362320;Unigene4368588;Unigene4861482;Unigene3131676;Unigene142038;Unigene1795414;Unigene493342;Unigene1861681;Unigene3312625;Unigene2198750;Unigene3273130;Unigene1978770;Unigene924277;Unigene2746313;Unigene815311;Unigene830616;Unigene1383440;Unigene4481957;Unigene2893464;Unigene3938085;Unigene1269755;Unigene1640174;Unigene1348473;Unigene1447911;Unigene3340297;Unigene1641397;Unigene2617896;Unigene4689720;Unigene1684416;Unigene3885101;Unigene4823704;Unigene3375387;Unigene4328461;Unigene4169828;Unigene67021;Unigene4256362;Unigene3421763;Unigene223354;Unigene1830226;Unigene3533193;Unigene3301845;Unigene1211877;Unigene2548734;Unigene3303257;Unigene367537;Unigene1597642;Unigene950846;Unigene1676102;Unigene1513841;Unigene3708624;Unigene4378966;Unigene1059123;Unigene4356255;Unigene1586407;Unigene2359567;Unigene1951572;Unigene3044161;Unigene2250146;Unigene95412;Unigene3658059;Unigene1228748;Unigene4379429;Unigene1645738;Unigene1885497;Unigene1769428;Unigene714415;Unigene4480579;Unigene1468504;Unigene2830394;Unigene622191;Unigene1917940;Unigene888704;Unigene1438882;Unigene2014981;Unigene36176;Unigene236559;Unigene3671203;Unigene3950955;Unigene3664290;Unigene2059338;Unigene3470916;Unigene2549547;Unigene2330343;Unigene2855193;Unigene388881;Unigene854069;Unigene2307910;Unigene84699;Unigene4429499;Unigene2029958;Unigene579229;Unigene2684387;Unigene2178587;Unigene3643856;Unigene1668409;Unigene1337681;Unigene1143183;Unigene1766290;Unigene44986;Unigene4898273;Unigene1924218;Unigene4615570;Unigene2767628;Unigene1391220;Unigene1902462;Unigene4033752;Unigene3892259;Unigene2390741;Unigene406758;Unigene4366642;Unigene2848650;Unigene2552545;Unigene649892;Unigene4020926;Unigene3121405;Unigene2440875;Unigene3545278;Unigene1460915;Unigene220098;Unigene4627472;Unigene4623163;Unigene9928;Unigene4720271;Unigene468346;Unigene351273;Unigene4381241;Unigene3042600;Unigene1281628;Unigene257619;Unigene4655016;Unigene1249286;Unigene1400459;Unigene3472994;Unigene4861967;Unigene2470552;Unigene2336487;Unigene1743558;Unigene3485227;Unigene1162792;Unigene3713726;Unigene4184712;Unigene3807377;Unigene4250812;Unigene1032209;Unigene2077515;Unigene1004853;Unigene409637;Unigene1291803;Unigene1814817;Unigene1417356;Unigene2577855;Unigene969632;Unigene2470553;Unigene3132922;Unigene78553;Unigene4648040;Unigene1611681;Unigene1803224;Unigene4521959;Unigene1499675;Unigene3281435;Unigene2364774;Unigene1927204;Unigene3271890;Unigene1821620;Unigene211492;Unigene557363;Unigene625100;Unigene492167;Unigene1216422;Unigene4609894;Unigene1636045;Unigene267294;Unigene893019;Unigene1398429;Unigene2287559;Unigene1036964;Unigene824976;Unigene1138136;Unigene1732625;Unigene3157136;Unigene839119;Unigene482126;Unigene1686774;Unigene1122936;Unigene1107978;Unigene1952314;Unigene2411400;Unigene4793831;Unigene493691;Unigene2115978;Unigene3430496;Unigene2032201;Unigene4851219;Unigene1567800;Unigene2864782;Unigene2946008;Unigene357362;Unigene3414649;Unigene3480995;Unigene4824505;Unigene4306201;Unigene4536023;Unigene4347616;Unigene4886766;Unigene443213;Unigene1076343;Unigene1132352;Unigene2733460;Unigene997751;Unigene1161707;Unigene430867;Unigene4701663;Unigene783745;Unigene3253741;Unigene133197;Unigene904536;Unigene3259877;Unigene242529;Unigene3922870;Unigene2392676;Unigene612457;Unigene1291802;Unigene2756384;Unigene1640623;Unigene2734935;Unigene1661982;Unigene626345;Unigene4745282;Unigene434563;Unigene4843078;Unigene3467891;Unigene4523805;Unigene1473355;Unigene3751955;Unigene159742;Unigene221011;Unigene3777474;Unigene2299460;Unigene2910618;Unigene2771419;Unigene742441;Unigene3423909;Unigene1624762;Unigene2670060;Unigene2181951;Unigene2053668;Unigene3883520;Unigene4533354;Unigene1552206;Unigene823040;Unigene3527005;Unigene3959054;Unigene1861876;Unigene792235;Unigene2667938;Unigene428944;Unigene954283;Unigene991585;Unigene2893556;Unigene532228;Unigene2634557;Unigene1200177;Unigene1481250;Unigene2610286;Unigene4144897;Unigene894786;Unigene1395655;Unigene72699;Unigene4580359;Unigene3640040;Unigene3462357;Unigene4134628;Unigene2572142;Unigene1774260;Unigene3415882;Unigene1303318;Unigene177756;Unigene2525807;Unigene1680243;Unigene2753570;Unigene1097573;Unigene432307;Unigene2183146;Unigene4484146;Unigene3524616;Unigene1133843;Unigene2505068;Unigene1995802;Unigene472007;Unigene2561519;Unigene814363;Unigene1220869;Unigene85279;Unigene694436;Unigene671037;Unigene4057058;Unigene2289363;Unigene619357;Unigene1602887;Unigene114478;Unigene2398323;Unigene1734389;Unigene2454448;Unigene1103047;Unigene243700;Unigene958863;Unigene1135706;Unigene4433188;Unigene1205610;Unigene4183763;Unigene4743316;Unigene3208088;Unigene1474856;Unigene1078041;Unigene252882;Unigene764206;Unigene3922528;Unigene2301184;Unigene980074;Unigene1000715;Unigene4137455;Unigene3998619;Unigene3150753;Unigene602755;Unigene2424176;Unigene173692;Unigene1516304;Unigene450647;Unigene3185093;Unigene3848465;Unigene505368;Unigene3879214;Unigene1625690;Unigene3461210;Unigene3615446;Unigene2996198;Unigene3978482;Unigene3231464;Unigene521236;Unigene4284940;Unigene4384366;Unigene3389736;Unigene1676089;Unigene2756655;Unigene659105;Unigene788130;Unigene1953827;Unigene3589907;Unigene4468484;Unigene103929;Unigene510551;Unigene3171832;Unigene1120605;Unigene932410;Unigene353062;Unigene742847;Unigene154442;Unigene4218582;Unigene2488976;Unigene81333;Unigene1081146;Unigene3669187;Unigene4818030;Unigene4844723;Unigene4837529;Unigene276586;Unigene299880;Unigene2898531;Unigene2652495;Unigene2752868;Unigene4506942;Unigene2360847;Unigene4192350;Unigene289661;Unigene2186638;Unigene2779521;Unigene3130928;Unigene28790;Unigene2826241;Unigene3505727;Unigene4700448;Unigene26062;Unigene4102988;Unigene4696535;Unigene315006;Unigene4263476;Unigene1454960;Unigene4490095;Unigene4158115;Unigene506021;Unigene376399;Unigene3848532;Unigene674661;Unigene2653475;Unigene394425;Unigene4011097;Unigene959319;Unigene1366988;Unigene4757884;Unigene3882932;Unigene117447;Unigene1111635;Unigene2009138;Unigene4348144;Unigene3803817;Unigene4342015;Unigene380430;Unigene1485837;Unigene2523260;Unigene1196218;Unigene2944468;Unigene4332572;Unigene4121824;Unigene3738883;Unigene4417577;Unigene3143542;Unigene3997978;Unigene3795173;Unigene1506349;Unigene3623877;Unigene2698506;Unigene4643203;Unigene485323;Unigene3021038;Unigene796182;Unigene3500573;Unigene3900123;Unigene3233215;Unigene2908644;Unigene3006548;Unigene1531932;Unigene1644104;Unigene913197;Unigene4467786;Unigene4904683;Unigene858561;Unigene506665;Unigene2263568;Unigene4469343;Unigene993397;Unigene4758314;Unigene1943360;Unigene4347971;Unigene1297572;Unigene2684582;Unigene1706547;Unigene3008649;Unigene4403116;Unigene3272279;Unigene3900863;Unigene4590123;Unigene4537333;Unigene1794759;Unigene2069532;Unigene3794865;Unigene470486;Unigene2241363;Unigene1762211;Unigene3656808;Unigene237499;Unigene2533705;Unigene1964209;Unigene4635504;Unigene1838122;Unigene933479;Unigene4026824;Unigene1921717;Unigene4270788;Unigene3562912;Unigene3834456;Unigene4083881;Unigene1420788;Unigene2889103;Unigene3453685;Unigene4812904;Unigene2689995;Unigene2798430;Unigene855685;Unigene4410995;Unigene2306640;Unigene3202544;Unigene2949331;Unigene3679969;Unigene1196640;Unigene1951790;Unigene1369892;Unigene985776;Unigene1495374;Unigene1071446;Unigene3163993;Unigene2560605;Unigene1828430;Unigene3591832;Unigene4453492;Unigene4487105;Unigene1265163;Unigene1944269;Unigene1601226;Unigene118838;Unigene2870917;Unigene4306948;Unigene1872986;Unigene4643033;Unigene3138068;Unigene3391380;Unigene4416298;Unigene3831819;Unigene2077161;Unigene2867292;Unigene354329;Unigene1912371;Unigene1562784;Unigene957501;Unigene1406289;Unigene236532;Unigene1665446;Unigene3023616;Unigene4832474;Unigene3373534;Unigene4917815;Unigene2606464;Unigene1735465;Unigene755514;Unigene2730035;Unigene1997630;Unigene4164226;Unigene2159425;Unigene3142241;Unigene2204989;Unigene3762122;Unigene1933787;Unigene1760563;Unigene1241122;Unigene2756841;Unigene1932472;Unigene4370423;Unigene3956931;Unigene4193375;Unigene2558995;Unigene1295699;Unigene4823938;Unigene1988259;Unigene394704;Unigene245346;Unigene1397229;Unigene3290473;Unigene393341;Unigene3651108;Unigene4479950;Unigene3238796;Unigene3375853;Unigene4742540;Unigene4363995;Unigene3887256;Unigene2320063;Unigene1642051;Unigene3021783;Unigene1304130;Unigene1816097;Unigene3055609;Unigene965487;Unigene2124585;Unigene4781326;Unigene460704;Unigene1319446;Unigene1739774;Unigene3414743;Unigene4461156;Unigene3973590;Unigene4509001;Unigene1753881;Unigene1541041;Unigene1804432;Unigene3789661;Unigene2205130;Unigene3027129;Unigene2561644;Unigene326210;Unigene4159681;Unigene1325556;Unigene3426177;Unigene2799194;Unigene2605188;Unigene4706387;Unigene3126271;Unigene1240176;Unigene498721;Unigene1000037;Unigene1532983;Unigene2187991;Unigene3670099;Unigene585692;Unigene4019152;Unigene1792533;Unigene3139498;Unigene1196991;Unigene3697023;Unigene1243412;Unigene3611183;Unigene1324361;Unigene4468809;Unigene4943921;Unigene4806898;Unigene52813;Unigene2583282;Unigene789163;Unigene311705;Unigene373869;Unigene3318063;Unigene648605;Unigene2269582;Unigene376077;Unigene1799285;Unigene180810;Unigene1355424;Unigene865382;Unigene3869515;Unigene510043;Unigene4758833;Unigene4077249;Unigene1238547;Unigene3836083;Unigene3037590;Unigene1761782;Unigene246165;Unigene3268861;Unigene4775018;Unigene1273786;Unigene2818571;Unigene1587589;Unigene27750;Unigene2759046;Unigene18588;Unigene3813484;Unigene327065;Unigene1572419;Unigene784969;Unigene3913258;Unigene1575444;Unigene3792336;Unigene4270373;Unigene463139;Unigene1260077;Unigene4530146;Unigene4112165;Unigene1546747;Unigene86150;Unigene4415453;Unigene2949035;Unigene4893967;Unigene637492;Unigene659586;Unigene888790;Unigene3829296;Unigene3522895;Unigene4127492;Unigene579169;Unigene806259;Unigene1668878;Unigene4934563;Unigene3004474;Unigene4810389;Unigene2201795;Unigene3336406;Unigene1484402;Unigene4751701;Unigene1539622;Unigene1919262;Unigene4415374;Unigene2931575;Unigene1734356;Unigene2486371;Unigene2810931;Unigene107613;Unigene387076;Unigene412694;Unigene844318;Unigene655727;Unigene2996053;Unigene3821079;Unigene2833185;Unigene3369962;Unigene2604020;Unigene3916848;Unigene3367702;Unigene827457;Unigene769598;Unigene948938;Unigene3003013;Unigene4533077;Unigene2220181;Unigene2095248;Unigene2076597;Unigene4135531;Unigene1662533;Unigene3568909;Unigene1553604;Unigene3267167;Unigene3976050;Unigene2912920;Unigene3650663;Unigene829191;Unigene3836272;Unigene1863966;Unigene688359;Unigene2905456;Unigene429963;Unigene1331114;Unigene1824218;Unigene1525803;Unigene2004769;Unigene2187662;Unigene884394;Unigene899820;Unigene4681465;Unigene2928774;Unigene2749112;Unigene2558932;Unigene2024628;Unigene3063997;Unigene1508534;Unigene2508769;Unigene4790984;Unigene790443;Unigene3021263;Unigene4440397;Unigene414112;Unigene737669;Unigene1625446;Unigene1125300;Unigene3501089;Unigene4657432;Unigene1876724;Unigene2944464;Unigene3050651;Unigene330711;Unigene204447;Unigene4130576;Unigene4012364;Unigene3212835;Unigene1132148;Unigene2261892;Unigene2956252;Unigene3151777;Unigene3898447;Unigene793388;Unigene120353;Unigene584508;Unigene1912322;Unigene2443422;Unigene1516135;Unigene2425724;Unigene1848939;Unigene1941900;Unigene167363;Unigene4354521;Unigene2080089;Unigene4909959;Unigene4173838;Unigene2547228;Unigene2594597;Unigene4128688;Unigene2287192;Unigene2364111;Unigene126415;Unigene485821;Unigene3823109;Unigene1633994;Unigene2760748;Unigene422493;Unigene504018;Unigene2150907;Unigene3833570;Unigene2940897;Unigene2212613;Unigene1618498;Unigene2864619;Unigene2081004;Unigene1796300;Unigene3879515;Unigene4586273;Unigene4405848;Unigene205129;Unigene447409;Unigene1981395;Unigene221522;Unigene3968386;Unigene2092197;Unigene1966356;Unigene1374533;Unigene3174798;Unigene2848100;Unigene428513;Unigene973633;Unigene4004811;Unigene1408670;Unigene2389797;Unigene788157;Unigene1793317;Unigene3689250;Unigene583071;Unigene250181;Unigene1404700;Unigene72121;Unigene2394133;Unigene336114;Unigene3367151;Unigene3150305;Unigene1413638;Unigene287166;Unigene2388991;Unigene997142;Unigene3535736;Unigene4013531;Unigene3800374;Unigene1494181;Unigene780787;Unigene1271708;Unigene4031264;Unigene2981866;Unigene910829;Unigene581746;Unigene2920935;Unigene2657944;Unigene254563;Unigene3407910;Unigene2528951;Unigene623282;Unigene4110642;Unigene4876763;Unigene2871106;Unigene1596799;Unigene4911220;Unigene1789444;Unigene4090377;Unigene3940573;Unigene839049;Unigene1494115;Unigene1861879;Unigene314108;Unigene1671599;Unigene345355;Unigene837317;Unigene2062465;Unigene2801521;Unigene4363218;Unigene3351866;Unigene2960553;Unigene253467;Unigene563904;Unigene2057916;Unigene3722730;Unigene1228356;Unigene669631;Unigene1136327;Unigene3018346;Unigene3382991;Unigene363806;Unigene3415331;Unigene454802;Unigene51950;Unigene1949115;Unigene2081633;Unigene1804221;Unigene3891922;Unigene429830;Unigene111165;Unigene1773932;Unigene711659;Unigene415847;Unigene3138044;Unigene69658;Unigene2023211;Unigene3835438;Unigene2975542;Unigene2067403;Unigene566867;Unigene1506273;Unigene4686153;Unigene247539;Unigene4242775;Unigene1285650;Unigene4432875;Unigene2184960;Unigene1178268;Unigene2174265;Unigene2845083;Unigene2819996;Unigene2473041;Unigene3716331;Unigene4118631;Unigene3664697;Unigene1034570;Unigene4348319;Unigene4312214;Unigene1383444;Unigene4190220;Unigene2845148;Unigene671929;Unigene4028516;Unigene1538378;Unigene3859139;Unigene555704;Unigene3678244;Unigene91882;Unigene1207560;Unigene1226936;Unigene1903992;Unigene1657715;Unigene4166393;Unigene3227396;Unigene3477931;Unigene619544;Unigene2178090;Unigene4312242;Unigene1129612;Unigene3721642;Unigene2544650;Unigene283282;Unigene915421;Unigene2007586;Unigene202684;Unigene3341580;Unigene4789770;Unigene3131865;Unigene1297666;Unigene2351173;Unigene853527;Unigene1366460;Unigene2069227;Unigene222196;Unigene4395525;Unigene4311697;Unigene62968;Unigene4231180;Unigene2981470;Unigene2170890;Unigene883917;Unigene887841;Unigene1463700;Unigene1791573;Unigene2099664;Unigene3701193;Unigene4636747;Unigene2142867;Unigene2013625;Unigene2874394;Unigene3364564;Unigene4738867;Unigene6885;Unigene3110948;Unigene3870923;Unigene1042836;Unigene3688351;Unigene3523975;Unigene3658858;Unigene1445454;Unigene279069;Unigene2339472;Unigene4429631;Unigene4293687;Unigene1903378;Unigene1228930;Unigene4261481;Unigene2901605;Unigene2298298;Unigene4785059;Unigene4198945;Unigene1869423;Unigene4112893;Unigene2609055;Unigene3793358;Unigene3628561;Unigene2203859;Unigene1938000;Unigene3360716;Unigene1842588;Unigene3457925;Unigene1849653;Unigene860726;Unigene876431;Unigene2540178;Unigene570460;Unigene3836337;Unigene3807730;Unigene3642119;Unigene1529760;Unigene3522997;Unigene4395311;Unigene198023;Unigene47001;Unigene342940;Unigene4944210;Unigene3319182;Unigene1643296;Unigene3143463;Unigene2738315;Unigene1906926;Unigene1439313;Unigene708919;Unigene2454376;Unigene4864156;Unigene2523372;Unigene4318508;Unigene2710439;Unigene242280;Unigene4315984;Unigene2346839;Unigene1464226;Unigene474414;Unigene4112712;Unigene1616068;Unigene556107;Unigene4208557;Unigene2866424;Unigene4058637;Unigene2897332;Unigene4013267;Unigene4951424;Unigene1003852;Unigene3506056;Unigene854188;Unigene3094156;Unigene950694;Unigene3270791;Unigene1845879;Unigene1573923;Unigene4416;Unigene3275922;Unigene281042;Unigene504626;Unigene2547916;Unigene4474511;Unigene29874;Unigene3085250;Unigene201211;Unigene4005307;Unigene4102962;Unigene384078;Unigene3517621;Unigene4928205;Unigene2793154;Unigene497624;Unigene1829840;Unigene577032;Unigene727327;Unigene3840135;Unigene2703506;Unigene1427567;Unigene958330;Unigene553988;Unigene4345302;Unigene512589;Unigene4830231;Unigene277008;Unigene2101138;Unigene3121571;Unigene4514466;Unigene4307560;Unigene865751;Unigene3985416;Unigene49029;Unigene426667;Unigene4354941;Unigene3005265;Unigene338312;Unigene4304153;Unigene23176;Unigene3914984;Unigene2735015;Unigene4948848;Unigene2111492;Unigene2802200;Unigene2170222;Unigene393943;Unigene558060;Unigene2070108;Unigene3450105;Unigene1454807;Unigene2042078;Unigene3742609;Unigene2643253;Unigene878239;Unigene4285699;Unigene267635;Unigene838685;Unigene401530;Unigene657391
AAAuxiliary ActivitiesAA10AA10 (formerly CBM33) proteins are copper-dependent lytic polysaccharide monooxygenases (LPMOs); some proteins have been shown to act on chitin, others on cellulose; lytic cellulose monooxygenase (C1-hydroxylating) (EC 1.14.99.54); lytic cellulose monooxygenase (C4-dehydrogenating)(EC 1.14.99.56); lytic chitin monooxygenase (EC 1.14.99.53)80Unigene1336444;Unigene2779114;Unigene615694;Unigene4789776;Unigene3971533;Unigene1428158;Unigene3127938;Unigene1619422;Unigene210738;Unigene4365491;Unigene114648;Unigene2538497;Unigene4202712;Unigene2183136;Unigene3789908;Unigene487979;Unigene547741;Unigene219971;Unigene1061865;Unigene4756387;Unigene470419;Unigene3702929;Unigene2253288;Unigene2859609;Unigene1244518;Unigene3453626;Unigene3521786;Unigene729878;Unigene3225703;Unigene4356596;Unigene4830395;Unigene4145028;Unigene2259250;Unigene973514;Unigene4710403;Unigene1431077;Unigene719247;Unigene283936;Unigene1412707;Unigene1531288;Unigene3996556;Unigene3218378;Unigene4083558;Unigene1342273;Unigene112211;Unigene1319088;Unigene3670255;Unigene1645165;Unigene3831428;Unigene3174080;Unigene3812608;Unigene2025798;Unigene1924441;Unigene3534355;Unigene4546245;Unigene2659199;Unigene3359250;Unigene2986791;Unigene276214;Unigene1301416;Unigene4830396;Unigene1882924;Unigene991308;Unigene4106892;Unigene56176;Unigene3446134;Unigene4555996;Unigene1556710;Unigene2785304;Unigene3759124;Unigene2986728;Unigene817611;Unigene1140905;Unigene4305080;Unigene3037987;Unigene714729;Unigene352088;Unigene3527243;Unigene3326466;Unigene411142
AAAuxiliary ActivitiesAA3cellobiose dehydrogenase (EC 1.1.99.18); glucose 1-oxidase (EC 1.1.3.4); aryl alcohol oxidase (EC 1.1.3.7); alcohol oxidase (EC 1.1.3.13); pyranose oxidase (EC 1.1.3.10)14Unigene3887080;Unigene4031013;Unigene614775;Unigene3367347;Unigene1545151;Unigene16187;Unigene113985;Unigene3295999;Unigene3150810;Unigene3364816;Unigene2140525;Unigene253144;Unigene4770530;Unigene3693298
AAAuxiliary ActivitiesAA4vanillyl-alcohol oxidase (EC 1.1.3.38)132Unigene1657095;Unigene1984421;Unigene1322280;Unigene657828;Unigene461217;Unigene2732914;Unigene2195235;Unigene1556358;Unigene79372;Unigene2410796;Unigene4278904;Unigene1481962;Unigene175410;Unigene4879639;Unigene110715;Unigene190021;Unigene2110170;Unigene726874;Unigene2020114;Unigene4052204;Unigene1000261;Unigene1240705;Unigene2732911;Unigene2319642;Unigene1526785;Unigene2000594;Unigene379656;Unigene857619;Unigene957920;Unigene4618211;Unigene499107;Unigene717069;Unigene225700;Unigene133697;Unigene1603421;Unigene1945118;Unigene399925;Unigene3694344;Unigene556231;Unigene1230816;Unigene1406920;Unigene450036;Unigene1406919;Unigene1913662;Unigene3874950;Unigene4385032;Unigene1756537;Unigene1084029;Unigene4438520;Unigene2206424;Unigene4625490;Unigene1943455;Unigene1934765;Unigene3004728;Unigene1871934;Unigene1793535;Unigene304691;Unigene145441;Unigene529818;Unigene1077196;Unigene3591195;Unigene4168305;Unigene555198;Unigene2023782;Unigene910630;Unigene3496129;Unigene2576373;Unigene152662;Unigene4627776;Unigene147356;Unigene430294;Unigene4879058;Unigene40465;Unigene4783104;Unigene1018801;Unigene3308616;Unigene918060;Unigene82972;Unigene2597394;Unigene4785801;Unigene712417;Unigene880324;Unigene2229020;Unigene163512;Unigene2166383;Unigene3143947;Unigene2536265;Unigene2322099;Unigene2714573;Unigene4346401;Unigene850791;Unigene3308618;Unigene4440799;Unigene4470421;Unigene258530;Unigene4219798;Unigene1735067;Unigene122884;Unigene2400788;Unigene4889315;Unigene2732913;Unigene2173423;Unigene2884478;Unigene866729;Unigene1140060;Unigene2624535;Unigene3496130;Unigene3963384;Unigene1210375;Unigene699691;Unigene1350254;Unigene190019;Unigene1894368;Unigene853204;Unigene1589628;Unigene1140439;Unigene632994;Unigene1459196;Unigene898298;Unigene331196;Unigene811611;Unigene4271816;Unigene943547;Unigene2273879;Unigene4107084;Unigene2195243;Unigene4440797;Unigene1630271;Unigene518231;Unigene290530;Unigene177274;Unigene2365540
AAAuxiliary ActivitiesAA5Oxidase with oxygen as acceptor (EC 1.1.3.-); galactose oxidase (EC 1.1.3.9); glyoxal oxidase (EC 1.2.3.15); alcohol oxidase (EC 1.1.3.13)118Unigene1319503;Unigene908048;Unigene144854;Unigene4452840;Unigene455511;Unigene1305525;Unigene2554893;Unigene1155881;Unigene722667;Unigene1657919;Unigene1377059;Unigene31302;Unigene2883062;Unigene4594998;Unigene2242292;Unigene164000;Unigene279800;Unigene3621983;Unigene4060887;Unigene4181118;Unigene1935169;Unigene4092916;Unigene638549;Unigene4924052;Unigene2583032;Unigene2728443;Unigene1833627;Unigene4384257;Unigene1468652;Unigene2568943;Unigene628989;Unigene3433055;Unigene3869597;Unigene1905175;Unigene1442848;Unigene836127;Unigene2897736;Unigene668966;Unigene51348;Unigene2804513;Unigene41203;Unigene67006;Unigene2309433;Unigene1772253;Unigene1695644;Unigene132987;Unigene1225370;Unigene3178383;Unigene1975840;Unigene809052;Unigene2476453;Unigene657085;Unigene576994;Unigene768466;Unigene1120178;Unigene2740603;Unigene1124894;Unigene2232523;Unigene799301;Unigene1581420;Unigene4533775;Unigene822461;Unigene3101130;Unigene1665450;Unigene2627624;Unigene1724450;Unigene263074;Unigene918901;Unigene4680561;Unigene8314;Unigene1129241;Unigene530183;Unigene339947;Unigene2848975;Unigene524103;Unigene4502841;Unigene3025468;Unigene256250;Unigene2404723;Unigene1121810;Unigene946136;Unigene1500512;Unigene1503664;Unigene4746890;Unigene1632698;Unigene1124343;Unigene2416854;Unigene540238;Unigene2756887;Unigene772242;Unigene3725579;Unigene954009;Unigene652287;Unigene2069311;Unigene3280879;Unigene616941;Unigene329471;Unigene1167525;Unigene1115709;Unigene290940;Unigene1621559;Unigene327457;Unigene1241855;Unigene92789;Unigene1685388;Unigene1292986;Unigene2914626;Unigene2465691;Unigene1467423;Unigene3495275;Unigene1110342;Unigene547451;Unigene1486382;Unigene2612236;Unigene1759878;Unigene2235522;Unigene189029;Unigene2253928
AAAuxiliary ActivitiesAA61,4-benzoquinone reductase (EC. 1.6.5.6)4Unigene3560931;Unigene304629;Unigene1271565;Unigene615123
AAAuxiliary ActivitiesAA7glucooligosaccharide oxidase (EC 1.1.3.-); chitooligosaccharide oxidase (EC 1.1.3.-)11Unigene2001300;Unigene3297112;Unigene12290;Unigene742898;Unigene2695095;Unigene2886105;Unigene110715;Unigene251846;Unigene3416721;Unigene285326;Unigene843749
AAAuxiliary ActivitiesAA8Iron reductase domain1Unigene253144
AAAuxiliary ActivitiesAA9AA9 (formerly GH61) proteins are copper-dependent lytic polysaccharide monooxygenases (LPMOs); cleavage of cellulose chains with oxidation of carbons C1 and/or C4 and C-6); lytic cellulose monooxygenase (C1-hydroxylating) (EC 1.14.99.54); lytic cellulose monooxygenase (C4-dehydrogenating) (EC 1.14.99.56)3Unigene74575;Unigene3577271;Unigene1041337

使用柱形图展示每个功能分类的基因数目,便于对比微生物群体中各功能分类的基因数量分布情况。

Fig 6-3-1 CAZy注释统计图

我们使用circos软件绘制circos图展示每个功能分类的丰度分布,便于了解各样本功能分布规律。

Fig 6-3-2 CAZy功能分布circos图

6.4 CARD注释

微生物抗性及毒性基因的研究,是微生物研究中极具应用价值和实际应用意义的一项研究,目前我们常用的方法大多是依赖相关抗性毒性基因数据库进行分析。
CARD(Comprehensive Antibiotic Resistance Database)以Antibiotic Resistance Ontology(ARO)为分类单位(term)构建,用于关联抗生素模块及其目标、抗性机制、基因变异等信息。通过该数据库的注释,可以找到耐药性相关基因的名称,所耐受的抗生素种类等信息。
我们使用 Diamond 将预测基因序列比对至数据库进行注释

Tab 6-4-1 CARD注释表
GeneIDARO_NameARO_AccessionAMR_Gene_FamilyDrug_ClassResistance_MechanismF-3F-4F-5F-6G-2G-4G-5G-6
Unigene60YojIARO:3003952ATP-binding cassette (ABC) antibiotic efflux pumppeptide antibioticantibiotic efflux3.556833891525611e-073.192848354869383e-073.85283918966469e-070001.222359715024187e-070
Unigene148TaeAARO:3003986ATP-binding cassette (ABC) antibiotic efflux pumppleuromutilin antibioticantibiotic efflux05.455598110104968e-07000000
Unigene219tetA(58)ARO:3003980major facilitator superfamily (MFS) antibiotic efflux pumptetracycline antibioticantibiotic efflux01.7589275704065382e-06000000
Unigene413vanSFARO:3002936glycopeptide resistance gene cluster;vanSglycopeptide antibioticantibiotic target alteration7.479896509593455e-060000000
Unigene414vanRFARO:3002925glycopeptide resistance gene cluster;vanRglycopeptide antibioticantibiotic target alteration6.132217069531436e-060000000
Unigene433Streptomyces rishiriensis parY mutant conferring resistance to aminocoumarinARO:3003318aminocoumarin resistant parYaminocoumarin antibioticantibiotic target alteration1.7962707146585014e-078.522954737313404e-07000003.46766079062987e-06
Unigene483vanGARO:3002909glycopeptide resistance gene cluster;van ligaseglycopeptide antibioticantibiotic target alteration1.7835037269058013e-060004.1285748068600216e-07000
Unigene712vanSEARO:3002935glycopeptide resistance gene cluster;vanSglycopeptide antibioticantibiotic target alteration6.488473254415475e-065.296696246067359e-066.960216761679971e-063.0931989253593805e-0605.255303825025871e-073.247371673999093e-070
Unigene792bcrAARO:3002987ATP-binding cassette (ABC) antibiotic efflux pumppeptide antibioticantibiotic efflux1.7200524161226133e-060000000
Unigene793bcrAARO:3002987ATP-binding cassette (ABC) antibiotic efflux pumppeptide antibioticantibiotic efflux2.992257083346942e-060000000

我们使用circos软件绘制circos图展现丰度排名前10的ARO功能分布规律,便于初步比较样本间抗性基因功能分布差异等信息。

Fig 6-4-1 CARD注释Circos图

6.5 VFDB注释

VFDB,Virulence Factors of Pathogenic Bacteria,毒力因子数据库,由中国医学科学院研发,收集整理了重要医学病原菌中74个属951个菌株的1080个已知毒力因子的组成、结构、功能、致病机理、毒力岛、序列和基因组信息等内容。毒力因子数据库能够为病原菌的入侵机理研究提供关键证据,可以研究群落中不同病原菌的毒力因子构成等特征,被广泛应用于毒力因子基因鉴定。
我们使用 Diamond 与数据库进行比对注释

Tab 6-5-1 VFDB注释表
GeneIDSubjectDescriptionOrganismVFsVF_FullNameFunctionVFIDF-3F-4F-5F-6G-2G-4G-5G-6
Unigene1VFG022639(gi:333989135)(proC) pyrroline-5-carboxylate reductase ProC [Proline synthesis (CVF307)] [Mycobacterium sp. JDM601]-----9.038851272516904e-070000000
Unigene16VFG031407(gi:433630073)(ctpV) Putative metal cation transporter P-type ATPase CtpV [Copper exporter (CVF658)] [Mycobacterium canettii CIPT 140070010]-----001.8493628110390513e-072.4037713864265717e-0702.967263527620739e-0600
Unigene60VFG044172(gb|NP_756180)(chuV) ATP-binding hydrophilic protein ChuV [Chu (VF0227)] [Escherichia coli CFT073]Escherichia coli (UPEC)ChuE. coli hemin uptakeIron uptake: the ability to use heme and/or hemoglobin might be especially advantageous to pathogenic bacteria. These pathogens often secrete cytotoxins, which gain access to the intracellular heme reservoir besides initiating tissue invasion. Cytotoxin production coupled with the capability to utilize heme and/or hemoglobin could serve as an effective iron acquisition strategy during the progression of infectionVF02273.556833891525611e-073.192848354869383e-073.85283918966469e-070001.222359715024187e-070
Unigene144VFG031084(gi:433644988)(zmp1) endothelin-converting enzyme [Zn++ metallophrotease (CVF655)] [Mycobacterium smegmatis JS623]-----2.4816795455809075e-052.1444175455707123e-051.8403891096352753e-051.0683850988088986e-053.628834343687883e-056.83472219294583e-059.362398618977449e-052.465023983293573e-06
Unigene148VFG007913(gi:145223975)(ddrA) daunorubicin resistance ABC transporter ATPase subunit [PDIM (phthiocerol dimycocerosate) and PGL (phenolic glycolipid) biosynthesis and transport (CVF288)] [Mycobacterium gilvum PYR-GCK]-----05.455598110104968e-07000000
Unigene171VFG014970(gi:148545464)(algR) two component transcriptional regulator, LytTR family [Alginate regulation (CVF523)] [Pseudomonas putida F1]-----02.6159762366466053e-0703.461953355451312e-070000
Unigene211VFG009927(gi:118617353)(relA) GTP pyrophosphokinase RelA [(p)ppGpp synthesis and hydrolysis (CVF335)] [Mycobacterium ulcerans Agy99]-----01.8113238601911703e-06000000
Unigene219VFG044173(gi:28901507)(pvuE) iron-dicitrate transporter ATP-binding subunit [vibrioferrin (IA038)] [Vibrio parahaemolyticus RIMD 2210633]-----01.7589275704065382e-06000000
Unigene236VFG037640(gi:169794956)(bap) hypothetical protein [Biofilm-associated protein (CVF771)] [Acinetobacter baumannii AYE]-----5.098870247118811e-060000000
Unigene253VFG016390(gi:42784429)(BCE_5384) NAD dependent epimerase/dehydratase family protein [Polysaccharide capsule (CVF567)] [Bacillus cereus ATCC 10987]-----1.4613348546578092e-072.6235808187298805e-07000000

我们挑选丰度排名前10的毒力因子,以热图展示其在各样本间的分布规律:

Fig 6-5-1 毒力因子丰度热图

6.6 PHI注释

PHI-base(Pathogen Host Interactions),病原宿主互作数据库,是一个免费开放的数据库,收录了经过实验验证或文献报道的能够感染植物、动物、真菌和昆虫的真菌、卵菌、细菌等病原菌的致病基因、毒力基因和效应蛋白基因。另外,数据库还收录了抗真菌化合物及其靶基因。
最新版本为4.6,共收录6438个基因、11340对互作关系、263个病原菌、194个宿主、510种疾病。数据库包含核酸序列、蛋白序列、功能注释、其他外部数据库(NCBI taxonomy)的注释信息。
我们使用 Diamond 与数据库进行比对注释

Tab 6-6-1 PHI注释表
GeneIDSubjectPHI-baseAccessionGeneNamePathogen_NCBI_Taxonomy_IDPathogenSpeciesDiseaseHostDescriptionHost_NCBI_Taxonomy_IDHostSpeciesPhenotypeGeneFunctionF-3F-4F-5F-6G-2G-4G-5G-6
Unigene16A0A0D5YKF1PHI:8812|PHI:8812CopA (ABUW_2707)|CopA (ABUW_2707)-|-Acinetobacter baumannii|Acinetobacter baumanniiNosocomial infections|Nosocomial infectionsRodents|Rodents10090|10090Mus musculus (related: house mouse)|Mus musculus (related: house mouse)unaffected pathogenicity|reduced virulenceCopper-translocating P-type ATPase|Copper-translocating P-type ATPase001.8493628110390513e-072.4037713864265717e-0702.967263527620739e-0600
Unigene44I1RCH5PHI:1569GzOB009449239Fusarium graminearumFusarium ear blightMonocots4564Triticum (related: wheat)lethaltranscription factor00000000
Unigene51Q5KN74PHI:7207msh2-Cryptococcus neoformansFungal meningitisRodents10090Mus musculus (related: house mouse)unaffected pathogenicityDNA mismatch repair protein5.625020382828881e-07001.9695211668821954e-0704.630888308713168e-0700
Unigene60Q8ZQE4PHI:3928MacB1299114Salmonella entericaFood poisoning; enteritisRodents10090Mus musculus (related: house mouse)reduced virulenceABC-Type Efflux Pump3.556833891525611e-073.192848354869383e-073.85283918966469e-070001.222359715024187e-070
Unigene63A6QHK8PHI:3004ClpX93061Staphylococcus aureusSkin infections; food poisoning; respiratory diseasesRodents10090Mus musculus (related: house mouse)reduced virulencePart of proteolytic complex9.383718213376012e-0701.7425107375123505e-074.246662782686943e-070000
Unigene68Q839C1PHI:4139|PHI:4139ldh-1|ldh-1226185|226185Enterococcus faecalis|Enterococcus faecalisNosocomial infections|Nosocomial infectionsRodents|Rodents10090|10090Mus musculus (related: house mouse)|Mus musculus (related: house mouse)unaffected pathogenicity|reduced virulenceRedox Balance|Redox Balance004.415145449777915e-0700000
Unigene144Q8DNW9PHI:6096PepO373153Streptococcus pneumoniaePneumococcal pneumoniaRodents10090Mus musculus (related: house mouse)increased virulence (hypervirulence)Endopeptidase2.4816795455809075e-052.1444175455707123e-051.8403891096352753e-051.0683850988088986e-053.628834343687883e-056.83472219294583e-059.362398618977449e-052.465023983293573e-06
Unigene148G4NDE1PHI:1017|PHI:2067ABC4|ABC4-|-Magnaporthe oryzae|Magnaporthe oryzaeRice blast|Rice blastMonocots|Monocots4513|4513Hordeum vulgare (related: barley)|Hordeum vulgare (related: barley)reduced virulence|loss of pathogenicityABC Transporter|multidrug resistance05.455598110104968e-07000000
Unigene154Q882L6PHI:3350bifA223283Pseudomonas syringaeBacterial speckEudicots4081Solanum lycopersicum (related: tomato)reduced virulencec-di-GMP phosphodiesterase05.228455178859191e-07000000
Unigene171A4VXN7PHI:68971910RR (ssu05 1910)391295Streptococcus suisMeningitisEven-toed ungulates9823Sus scrofa (related: pig)reduced virulencevirulence related two-component system02.6159762366466053e-0703.461953355451312e-070000

Tab 6-6-2 PHI 数据库表型信息统计
Phenotype Classificatinchemistry target: resistance to chemicalchemistry target: sensitivity to chemicaleffector (plant avirulence determinant)increased virulence (hypervirulence)lethalloss of pathogenicityreduced virulenceunaffected pathogenicity
Num8942195601881359634

Fig 6-6-1 表型分类统计柱形图

广州基迪奥生物科技有限公司




7 物种组成分析

除了丰富特色的功能分析,宏基因组的另一大特色就是将物种研究推向了更精细的层面。相对于16S,宏基因组可以获得更丰富的属、种水平物种,因而通常用于对16S的拓展补充。基于各层级的物种丰度表,可以结合堆叠图、Circos图、热图等进行物种分布特征的初步展示,即物种组成分析。

7.1 物种注释

当前主流的物种注释有两种方法,即基于reads或基于基因。不同方法各有特色,在文章中也都有广泛应用:
1)基于reads进行物种注释:不受样本复杂度、组装结果的影响,在物种定性定量研究中更准确,使用率较高。
方法:使用Kaiju软件将reads比对Nr微生物库(包含细菌、真菌、古细菌、病毒、微小动植物)进行物种注释。
2)基于基因进行物种注释:受样本、组装影响,定性定量不一定准确,但可基于数据库找到功能-物种的对应关系,用于特定功能的物种分析。
方法:使用DIOMAND软件将非冗余基因集的unigene序列比对Nr微生物库,使用MEGAN(MEtaGenome Analyzer)软件的LCA算法,获得物种注释信息。
注:我们提供两种注释方案可选,若无选择,则默认第一种基于reads注释的方法。两种注释方法详细的比较见
各层级物种总注释信息统计如下:

Tab 7-1-1 物种注释总表
IDF-3_tagNumberF-4_tagNumberF-5_tagNumberF-6_tagNumberG-2_tagNumberG-4_tagNumberG-5_tagNumberG-6_tagNumberF-3_relativeAbundanceF-4_relativeAbundanceF-5_relativeAbundanceF-6_relativeAbundanceG-2_relativeAbundanceG-4_relativeAbundanceG-5_relativeAbundanceG-6_relativeAbundance
r__root19583451.023774457.021561531.021978713.024193302.021887107.021640329.023204817.01.01.01.01.01.01.01.01.0
r__root|k__Archaea156995.0190165.0136392.0190712.0464969.0196153.0103382.0664393.00.008016717788912690.0079987105488886660.0063257103588794320.0086771231782315910.0192189143920908360.0089620341326973920.0047772841161518390.02863168453343114
r__root|k__Archaea|p__Archaea_noname1666.02063.01676.02858.01762.01716.01811.02415.08.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname1666.02063.01676.02858.01762.01716.01811.02415.08.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname1666.02063.01676.02858.01762.01716.01811.02415.08.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname|f__Archaea_noname1666.02063.01676.02858.01762.01716.01811.02415.08.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname|f__Archaea_noname|g__Archaea_noname1666.02063.01676.02858.01762.01716.01811.02415.08.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname|f__Archaea_noname|g__Archaea_noname|s__Candidatus_Geothermarchaeota_archaeon33.0106.052.0157.027.051.066.076.01.6850962580599303e-064.4585666036452486e-062.4117025827154853e-067.143275404706363e-061.1160113654597458e-062.330138926081003e-063.049861210520413e-063.2751820451762235e-06
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname|f__Archaea_noname|g__Archaea_noname|s__Candidatus_Geothermarchaeota_archaeon_ex4572_2730.026.038.022.038.031.029.021.01.5319056891453912e-061.0936106763658157e-061.7623980412151623e-061.0009685280480253e-061.5706826624989016e-061.416358955068845e-061.340090531895333e-069.049845124829039e-07
r__root|k__Archaea|p__Archaea_noname|c__Archaea_noname|o__Archaea_noname|f__Archaea_noname|g__Archaea_noname|s__Candidatus_Pacearchaeota_archaeon169.0245.0202.0216.0188.0155.0167.0148.08.629735382185703e-061.0305177527293263e-059.368536955933231e-069.82769100265334e-067.770745803941935e-067.081794775344224e-067.71707306298347e-066.377986087974751e-06

使用 KRONA 对物种注释结果 进行可视化展示。圆圈从内到外依次代表不同的分类级别,扇形的大小代表不同OTU注释结果的相对比例。

KRONA可视化结果

7.2 物种丰度统计

优势物种很大程度决定着微生物群落的生态结构以及功能结构,了解群落在各个水平的物种组成情况能有 效地对群落结构的形成、改变以及生态影响等进行解读。 我们统计了各个层级分类水平上的各样品的物种组成情况

结果在文件夹:05.Taxonomy

Tab 7-2-1 物种丰度表(门水平)
LevelPhylumF-3F-4F-5F-6G-2G-4G-5G-6
2Archaea_noname8.507182927054072e-058.677380097471837e-057.773102939675295e-050.000130034911507329837.283007503481749e-057.840232151284315e-058.368634321594649e-050.00010407321893553395
2Candidatus_Aenigmarchaeota1.6085009736026607e-052.2797576407318155e-052.1890838827725175e-051.938239058856631e-051.3474803894069524e-051.2884297591271428e-051.4047845575730387e-051.5169264209237245e-05
2Candidatus_Bathyarchaeota0.000176832980050349660.000148520742240295970.000172761386934907350.00014277451095521380.00013044932849596140.00014981422624744330.00016265926456108870.00013669575588551292
2Candidatus_Diapherotrites4.953161728236765e-066.814035752740851e-068.162685664575489e-061.6015496448768405e-059.42409597499341e-067.767129753603343e-064.2513216873920906e-068.144860612346135e-06
2Candidatus_Heimdallarchaeota2.251901363043725e-052.469036411641284e-052.1519807661153562e-054.2586661011497805e-051.682283799044876e-052.2753121278202735e-052.7541170931366154e-052.3658880826338774e-05
2Candidatus_Korarchaeota2.8442382295132762e-052.7508514705509362e-052.5090982639405336e-053.294096428667138e-051.7690846830250784e-052.6636686155004404e-052.647834050951813e-053.1976119441062604e-05
2Candidatus_Lokiarchaeota1.884243997648831e-052.023179751276759e-051.2986090830006459e-051.5151023629090566e-052.095621341807745e-051.9098001394154102e-051.8345377281463698e-052.5511944351899004e-05
2Candidatus_Marsarchaeota4.54465354446466e-068.706823461835532e-064.962541852895326e-066.506295432312165e-065.745391844403877e-064.705966850712613e-065.683832255969861e-064.007788555281432e-06
2Candidatus_Micrarchaeota2.1293489079120936e-052.5994284538233618e-052.5322877118512596e-052.602518172924866e-052.463491754866698e-051.955489137966018e-052.5184459995964017e-052.417601483347186e-05
2Candidatus_Odinarchaeota1.5829692121169041e-061.2197965236387943e-061.2522301871791942e-061.8654413477258654e-061.446681399670041e-062.83271791013769e-061.1552504585304595e-062.1547250297212e-06

7.3 物种分布堆叠图

我们使用堆叠图直观展示各个层级分类水平上各样品的物种丰度情况,初步呈现样本间的物种分布规律,优势物种等信息。由于在堆叠图中无法呈现丰度过低的物种,我们只展示总丰度排名前10的高丰度物种,其余物种统一归类到Other类别,无法注释到该水平的序列则被归类到Unclassified类别。

  • Kingdom 水平物种分布堆叠图
  • Phylum 水平物种分布堆叠图
  • Class 水平物种分布堆叠图
  • Order 水平物种分布堆叠图
  • Family 水平物种分布堆叠图
  • Genus 水平物种分布堆叠图
  • Species 水平物种分布堆叠图

Fig 7-3-1 各分类水平物种分布堆叠图


7.4 物种分布热图

热图比较美观,可以包含的信息量相对较多,我们使用热图展示更多物种的丰度信息。挑选物种总丰度排名前20的物种,使用R语言pheatmap包绘制热图,详细展示各个层级分类水平上各样品的物种丰度高低。

  • Kingdom 水平物种分布热图
  • Phylum 水平物种分布热图
  • Class 水平物种分布热图
  • Order 水平物种分布热图
  • Family 水平物种分布热图
  • Genus 水平物种分布热图
  • Species 水平物种分布热图

Fig 7-4-1 各分类水平物种分布热图


7.5 物种分布Circos图

Circos图以更新颖的形式展示物种丰度分布规律,通过图中连线的粗细展示各样本的优势物种。对各分类水平,我们使用Circos软件绘制Circos图展示前10个高丰度物种。

  • Phylum 水平分布Circos图
  • Class 水平分布Circos图
  • Order 水平分布Circos图
  • Family 水平分布Circos图
  • Genus 水平分布Circos图
  • Species 水平分布Circos图

Fig 7-5-1 各分类水平物种分布Circos图


广州基迪奥生物科技有限公司




8 Alpha 多样性分析

8.1 Alpha 多样性指数

Alpha多样性是指特定生境或者生态系统内物种/功能的丰富程度,它可以指示生境的平衡状态、生存条件情况等。如有研究发现疾病组肠道菌群失衡,alpha多样性会显著下降。 通常基于物种/功能丰富度(种类的数量)和物种均匀度(丰度高低的均匀程度,若所有物种丰度相等,则均匀度最高)两个指标来评估。
我们主要展示Chao1,ACE(Alternating Conditional Expectation),Shannon,Simpson,四个指数以及相关分析结果。
- Chao1、ACE 指数只考虑物种的丰富度,数值越大,多样性越高。
- Shannon、Simpson综合体现物种的丰富度和均匀度,数值越大,样本的均衡性越高。

Tab 8-1-1 Alpha 多样性统计总表
TypeIndexF-3F-4F-5F-6G-2G-4G-5G-6
genuschao14284.5789473684214285.6826923076924350.743589743594344.7272727272734337.5913978494634386.7542372881344327.2916666666674378.028301886792
genusace4237.383925369514286.7206646959994298.9046652434684351.6663409100324300.7458488938884382.1323295939064318.2098891246984363.759557342247
genusshannon6.6991528452888796.5755478588109856.6556350457659837.14774552373121356.3116069477514416.5734712674494456.4871345169322956.393029425070315
genussimpson0.96387122730744560.95886407378381520.94932608081043160.96793614429613360.93852813445446360.95232719086300140.95054954653192590.9321293514407792
specieschao129398.03076923076529583.77299088649329690.51395139513629927.5511744966429489.0135782747630116.9369158878529320.26051779935429883.00240577386
speciesace29027.5144866765829271.55309782481329216.39769124631529531.71545876250529146.0778575154529777.32307434556829062.5859287899929547.499923863794
speciesshannon9.99523214351145310.03584451303773210.2533705812998310.3873182302896959.96300094359864610.2917857564023219.95171766931685310.2047790112494
speciessimpson0.98919129737045160.98915170254467550.9911199702205980.99086298566639360.99180344120084760.98851406759288040.98541630972551560.9916189989952692
genechao1461191.0348557693555474.2394576988400383.22291904216395696.3141057934388747.06076941924293558.0434788.34883833263588608.6996212524
geneace461195.7547310578554887.635305827400394.5814852618395710.7460666684388752.04222555825293558.0433940.0586241977587936.1538772129

8.2 Alpha 多样性差异分析

不同生境下的环境驱动因素能引起微生物α多样性差异。结合分组和采样信息,通过对两组或者多组间的α多样性进行假设检验,可以分析组间的物种/功能多样性是否存在显著的差异,从而初步判断驱动群落多样性变化的潜在因素等。我们同时使用以下几种主流的假设检验方法进行差异分析。
- 针对2个分组的比较时,使用weltch’s T-test检验和wilcox秩和检验
- 针对2个以上分组的比较时,使用Kruskal-Wallis秩和检验和Tukey检验 (上述的检验方法均要求每组至少含有3个重复样本)
以属水平Chao1指数为例,基于welch’s t 检验、Tukey HSD检验结果用盒型图展示两组差异如下:

结果在文件夹:06.Alpha

  • F_vs_G

Fig 8-2-1 Chao1指数Welch’s T检验盒型图


9 Beta 多样性分析

Beta 多样性分析可以用来分析样本关系,评估样本的聚类特征与预期分组是否一致,分组是否显著。我们从基因丰度、基因功能(默认KEGG、eggNOG、CAZy丰度表)以及物种丰度多个角度开展样本关系分析,进行组间差异及组间多元统计等分析,并使用韦恩图展示样本间共有、特有的信息。

9.1 样本相关性分析

同时,为了比较样本(分组)之间的相似性,我们基于物种(基因、功能)计算样本之间的Pearson系数,并进行热图展示。

结果在文件夹:07.Beta/Heatmap

  • gene
  • Genus
  • kegg.A
  • kegg.B
  • CAZy.A
  • CAZy.B
  • eggNOG.A
  • eggNOG.C

Fig 9-1-1 样本相关性分析热图


9.2 PCA主成分分析

基于物种(基因、功能)丰度表,可以开展主成分分析(PCA,Principal Component Analysis),从而利用降维的思想研究样本间的距离关系。这种方法借用方差分解可以有效的找出数据中最“主要”的元素和结构,将复杂的样本组成关系反映到横纵坐标的两个特征值上,从而达到简化数据复杂度的效果。分析结果中,样品组成越相似,反映在PCA 图中的距离越近,而且不同环境间的样品往往可能表现出各自聚集的分布情况。

结果在文件夹:07.Beta/PCA

  • gene
  • Genus
  • kegg.A
  • kegg.B
  • CAZy.A
  • CAZy.B
  • eggNOG.A
  • eggNOG.C

Fig 9-2-1 PCA分析图


9.3 PCoA主坐标分析

PCoA主坐标分析是一种展示样本间相似性的分析方式,它的分析思路与PCA分析基本一致,都是通过降维方式寻找复杂样本中的主要样本差异距离。与PCA不同的是,PCoA主要利用Bray-Curtis等信息,因此结果更集中于体现样本间结构的相异性。 基于物种(基因、功能)丰度信息,使用R语言计算样本间的Bray-Curtis矩阵,我们可以绘制PCoA图形。分析结果中,样品越相似,反映在PCoA 图中的距离越近,而且不同环境间的样品往往可能表现出各自聚集的分布情况。

结果在文件夹:07.Beta/PCoA

  • gene
  • Genus
  • kegg.A
  • kegg.B
  • CAZy.A
  • CAZy.B
  • eggNOG.A
  • eggNOG.C

Fig 9-3-1 PCoA分析图


9.4 NMDS分析

NMDS是非线性模型,适用于无法获得研究对象间精确的相似性或相异性数据,其设计目的是为了克服线性模型(包括PCA、 PCoA)的缺点,更好地反映生态学数据的非线性结构。我们根据PCoA分析所获得的Bray-Curtis矩阵,进行NMDS分析。其特点是根据样品中包含的物种(基因、功能)信息,以点的形式反映在多维空间上,而对不同样品间的差异程度,则是通过点与点间的距离体现的,最终获得样品的空间定位点图。

结果在文件夹:07.Beta/NMDS

  • gene
  • Genus
  • kegg.A
  • kegg.B
  • CAZy.A
  • CAZy.B
  • eggNOG.A
  • eggNOG.C

Fig 9-4-1 NMDS分析图


9.5 UPGMA分类树

在微生物生态研究当中,UPGMA分类树可以用于研究样本间的相似性,解答样本的分类学问题。利用R语言,根据物种(基因、功能)所计算的Bray-Curtis矩阵信息,可以将样本进行UPGMA分类树分类。其中越相似的样本将拥有越短的共同分支。

结果在文件夹:07.Beta/UPGMA

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

Fig 9-5-1 UPGMA样本分类树


9.6 Anosim差异检验

Analysis of Similarity (ANOSIM)分析是一种对微生物群落结构的非参数检验方法,用来检验组间的差异是否显著大于组内差异,从而判断分组是否有意义。
根据物种(基因、功能)所计算的Bray-Curtis矩阵信息,我们开展Anosim分析,以属水平为例展示如下。

结果在文件夹: 07.Beta/Anosim_and_Adonis/Anosim

Tab 9-6-1 物种水平Anosim分析结果表
diffsRvaluePvaluesignificant
F-VS-G-0.05210.583

根据分组信息,基于Bray-Curtis距离,利用Mothur软件可以计算两两样本之间距离的轶(Ranks),通过比较组内和组间的轶均值,从而获得分组差异信息。
结合Anosim检验结果,我们基于样本间Bray-Curtis距离的秩(rank),使用盒形图展示检验结果。

  • F-VS-G

Fig 9-6-1 物种水平Anosim结果盒形图


9.7 Adonis分析

Adonis是一种基于Bray-Curtis距离矩阵的非参数多元方差分析方法。该方法可分析不同分组因素对样品差异的解释度,并使用置换检验对分组的统计学意义进行显著性分析。根据物种(基因、功能)所计算的Bray-Curtis矩阵信息,我们开展Asonis分析,以属水平示例如下。

结果在文件夹: 06.Comparison/Anosim_and_Adonis/Adonis

Tab 9-7-1 基于Bray -Curtis距离的Adonis分析(物种水平)
diffsDfSumsOfSqsMeanSqsFvalueR2Pvaluesignificant
F-VS-G10.01890.01890.79470.1170.551

广州基迪奥生物科技有限公司




10 常规差异分析

10.1 物种韦恩分析

不同生境下的微生物群落,其物种分布存在一定程度的相似性和特异性。
在多分组(或样本)情况下,为了解不同分组(或样本)之间的物种差异情况,我们基于样本的物种丰度信息开展韦恩图(4组以内)/花瓣图(5组以上)和Upset图分析,以展示不同样本之间的共有特有信息,以种水平展示如下:

结果在文件夹:08.Different/Venn/

  • F-VS-G

Fig 10-1-1 各比较组 种 水平韦恩图(花瓣图)展示


10.2 功能韦恩图分析

类似的,我们也可以统计分组共有、特有的功能特征gene、KEGG、eggNOG、CAZy,辅助了解分组之间的功能差异。
以kegg.B水平示例如下:

  • F-VS-G

Fig 10-2-1 各比较组 kegg.B 水平韦恩图(花瓣图)展示


10.3 物种差异 Welch’s t 检验

使用Welch's T检验(R语言)进行两个分组间的物种(门到种水平,保留在至少一个样本的相对丰度达到0.1%以上的物种参与分析)差异分析,一般以P-value < 0.05(或0.01)为显著性阈值,P-value越小说明物种差异越显著.

Tab 10-3-1 种水平 F_vs_G Welch’s T检验统计
labelsFGfold(G/F)p-valuesignificant
Methanobrevibacter_millerae0.001436350.00151621.055594790.927357199978657no
Methanobrevibacter_olleyae0.00028180.000887163.148229930.29134122619155no
Methanobrevibacter_ruminantium0.000296610.001264294.262504220.23977526587141no
Methanobrevibacter_sp._YE3150.000336070.001731995.153746710.181108438782593no
Methanobrevibacter_thaueri0.000555690.00206493.715901990.177588593993751no
Acidobacteria_bacterium0.001189070.001050440.883412730.218483551673163no
Coriobacteriales_bacterium_OH10460.000910.001215961.336212760.242712453629302no
Slackia_heliotrinireducens0.000820830.000994161.211168070.333886284437545no
bacterium_F0830.003662970.003748671.023397440.960365274125692no
bacterium_P2010.00172760.001831941.060394780.816154555106317no

针对比较组间有显著差异(P< 0.05)的物种,我们使用柱形图直观展示其在两个分组中的丰度和差异显著性。(组间无显著差异的物种,则无图形)。

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

Fig 10-3-1 F_vs_G 各分类水平差异分析柱状图


10.4 功能差异 Welch’s t 检验

使用 Welch's T检验(R语言)进行两个分组功能(Gene,KEGG,CAZy,eggNOG各功能层级,取比较组中相对丰度之和在top200的功能参与分析)差异分析,结果以P-value < 0.05(或0.01)为阈值,P-value越小说明差异越显著。

Tab 10-4-1 KEGG LevelB F_vs_G Welch’s T检验统计
labelsFGfold(G/F)p-valuesignificant
Carbohydrate metabolism0.046478640.047282291.017290710.735713042352088no
Amino acid metabolism0.034576770.034582821.000175220.996699325734872no
Metabolism of cofactors and vitamins0.025417260.02617291.029729150.655394569910569no
Nucleotide metabolism0.02402650.024537781.021280020.57338365259418no
Membrane transport0.023210690.021772470.938036210.643425047020106no
Replication and repair0.019194780.018520550.964874070.405822351877779no
Translation0.018009690.016031980.89018630.184060083149889no
Energy metabolism0.014572960.014593411.001403330.973348053832285no
Signal transduction0.013587680.014046211.033745970.449599716284469no
Cellular community - prokaryotes0.012389520.012187470.983691560.896197491569113no

针对比较组间有显著差异(P< 0.05)的功能,我们使用柱形图直观展示其在两个分组中的丰度和差异显著性。(组间无显著差异的功能,则无图形)。

  • Gene
  • KEGG_A
  • KEGG_B
  • CAZy_A
  • CAZy_B
  • eggNOG_A
  • eggNOG_C

Fig 10-4-1 F_vs_G 功能差异分析柱状图


10.5 物种差异方差分析

使用方差分析(R语言)进行两个分组间的物种(门到种水平,保留在至少一个样本的相对丰度达到0.1%以上的物种参与分析)差异分析,一般以P-value < 0.05(或0.01)为显著性阈值,P-value越小说明物种差异越显著。

Tab 10-5-1 种水平 F-VS-G 物种 ANOVA 分析
SpeciesF-3F-4F-5F-6G-2G-4G-5G-6p_valueq_value
Methanobrevibacter_millerae0.001201473631996730.001839326971800030.0009124120174954180.001792188650900530.001443416033082210.0005970638330593440.0002767517998455570.00374758396069230.9273573723732870.968880836807912
Methanobrevibacter_olleyae4.83571562540229e-050.0005099170088301075.49126126525987e-050.0005139973391526610.00186282963772370.0001368385506590713.35484733157245e-050.001515418113402920.2913412054393810.739004008838848
Methanobrevibacter_ruminantium7.99144134504179e-050.0005366263465028878.47342426657921e-050.0004851512461170950.002271620467516170.0002007117706328214.99530298268571e-050.002534861619464610.239775260361180.739004008838848
Methanobrevibacter_sp._YE3150.0005472988392086770.0002345374281313760.0002885231109052510.0002739013881295050.001557166524850560.001179004607598440.0002020764102061480.003989731959532370.1811083757285160.739004008838848
Methanobrevibacter_thaueri0.001265762607417870.0002653688368150740.0003458938050363860.0003457436293016790.002342714524871390.001303461439650290.0002911231155496760.004322292220619540.1775886231745310.739004008838848
Acidobacteria_bacterium0.001121916663207110.001013398539449290.00127106001888270.001349897057211680.0008523846806855880.001143047365739110.001117450663527340.001088868746519310.2184827913981860.739004008838848
Coriobacteriales_bacterium_OH10460.00101851302918980.001097564499580370.0006573744693732550.000866565753872850.001757469898073440.001194081977120140.001144945624440370.0007673406775843140.2427121124801660.739004008838848
Slackia_heliotrinireducens0.0009208285097452950.00106147534726030.0006767608478266220.0006242403729463140.001349175073332280.0008343268025326510.001010751731177470.0007823806582917680.3338857458205460.739004008838848
bacterium_F0830.002296428755074880.001933083056323850.003679701594473970.006742660500639870.00220829715596490.006060554279741040.001078218357955650.005647620491900450.9603655499604270.975949493357208
bacterium_P2010.001343277035288620.001545650443246720.001770839000254670.002250632236746530.001736017679604050.002327306208170870.0008075662805311320.002456860573388710.8161540153356620.892394111597477

针对组间有显著差异(P<0.05)的物种或功能,我们使用盒型图展示每个物种在比较组分组当中的丰度,便于直观对比丰度差异。

  • Phylum
  • Class
  • Order
  • Family
  • Genus
  • Species

Fig 10-5-1 F-VS-G 各分类水平差异分析盒型图


10.6 功能差异方差分析

使用方差分析(ANOVA,Analysis of Variance)(R语言)进行两个分组功能(Gene,KEGG,CAZy,eggNOG各功能层级,取比较组中相对丰度之和在top200的功能参与分析)差异分析,结果以P-value < 0.05(或0.01)为阈值,P-value越小说明差异越显著。

Tab 10-6-1 KEGG LevelB F-VS-G 功能 ANOVA 分析
LevelBF-3F-4F-5F-6G-2G-4G-5G-6p_valueq_value
Carbohydrate metabolism0.05117227985272890.04751482195104150.04309287533255830.04413456851269720.04955185643881030.04752661395769030.04857611955448720.04347455081409620.7357130502869610.873659247215766
Amino acid metabolism0.03742008170280520.0345704531825290.03116212195275770.03515440638812050.03580437458754350.03367129549462090.03464021095231660.0342154164196430.9966993569714580.996699356971458
Metabolism of cofactors and vitamins0.02555647168078860.02405123843756260.02259248352133080.02946885655758480.02743285489000810.02537324827158320.02543340573365540.02645207580930850.6553946359901730.872016482341493
Nucleotide metabolism0.02567761562499020.02469563349071150.02232875321944660.02340399500970270.02389309199421740.0245644059168670.02572904486642780.02396459251608050.57338348852660.872016482341493
Membrane transport0.02868438682234790.02368662986937370.01826465889142280.02220709753178810.02060413872194080.02181043270856460.02717902856912430.01749628345243850.6434250401570550.872016482341493
Replication and repair0.01968395426310690.02064275903120890.01748014447534030.01897227919296020.0180073111481550.01908888839972820.01897714148956260.01800885761091790.4058224255046130.872016482341493
Translation0.01959163280606510.02007150316281880.01553664030439220.01683898265021980.01581631058893440.01450801386004960.01782496326506670.01597862887692530.1840600953727880.872016482341493
Energy metabolism0.01542842805143520.01452185944699780.01315771860244040.01518381434314710.01525491253842740.0139130099912010.014723412925070.01448228780805330.9733476653562490.996699356971458
Signal transduction0.01418348152887190.01355964107022690.01265812938179350.01394945682473990.01484693936614820.01456523795870450.01280259222619190.01397005682396060.4495999196633060.872016482341493
Cellular community - prokaryotes0.01480305638415260.01239416265351270.01087043653178020.01149043662195960.01097876895338880.0129757026070630.01514680577737450.009648599752122610.8961974984345010.996699356971458

针对组间有显著差异(P<0.05)的功能,我们使用盒型图展示每个功能在比较组分组当中的丰度,便于直观对比丰度差异。

  • Gene
  • KEGG_A
  • KEGG_B
  • CAZy_A
  • CAZy_B
  • eggNOG_A
  • eggNOG_C

Fig 10-6-1 F-VS-G 功能差异分析盒型图


广州基迪奥生物科技有限公司




11 个性化差异分析

11.1 物种差异 Metastats 分析

MetaStats 可用于两组间的差异分析。是不同方法的综合。首先进行T检验计算,若组内物种数量少于样本重复数,则基于Fisher精确检验计算P值;若组内物种数大于样本重复数,且重复数大于等于8,则进行单物种的Permutation test 置换检验计算P值,若组内物种数大于样本重复数,且重复数小于8,则混合整个样本基于Permutation test 置换检验计算P值。最后进行多重检验校正计算q值。

结果在文件夹:08.Different/MetaStats

Tab 11-1-1 Metastats 物种差异分析表(F-VS-G)
PhylumMean(F%)variance(F%)std.err(F%)Mean(G%)variance(G%)std.err(G%)P-valueFDR
Candidatus_Saccharibacteria0.00527331510335893.03066539687542e-060.0008704403191597090.002133620220785241.39340840826156e-060.0005902136071503190.01138461538461540.142307692307692
Tenericutes0.003985343361748783.6553101405306e-070.0003022958046570690.002290766093958274.23786334244343e-070.0003254943679406540.006346153846153850.142307692307692
Chlamydiae0.001373041193526099.55323228260157e-080.0001545415177436270.001053911813387212.95128132716907e-092.71628483740617e-050.06069230769230770.307692307692308
Fusobacteria0.002130997730711951.83858953638619e-070.0002143938861293730.001700468182322717.47403154228998e-094.32262407059936e-050.07384615384615390.307692307692308
Lentisphaerae0.001404272331262151.12295252740029e-070.0001675524192156210.0009685774236380927.27282894093774e-080.000134840914978890.06180769230769230.307692307692308
Viruses_noname0.001694046072778278.50346249512059e-070.0004610711033864680.0007068422875097281.5409744808716e-070.0001962762390657360.07273076923076920.307692307692308
Verrucomicrobia0.00272409301508242.52278387680304e-070.0002511366100752260.002120422350831821.69763488428322e-070.0002060118251632190.0950.339285714285714
Candidatus_Melainabacteria0.0006898230518107871.63949934891477e-070.0002024536582106370.0003693567830647853.28645845598799e-089.06429596823161e-050.19550.382396449704142
Elusimicrobia0.0008238314873502643.47778748478986e-070.000294863845053520.0003956146431070533.80477483381041e-093.08414284437767e-050.1966153846153850.382396449704142
Euryarchaeota0.007340912770246411.04201495822828e-060.0005103956696104220.01564754440753670.0001319866306918660.005744271726943850.1988461538461540.382396449704142

11.2 代谢通路 reporter score 分析

对比较组内pathway,我们基于reporter_score算法,进行精细的差异分析,获得pathway的得分,统计如下

结果在文件夹:08.Different/ReportScore

Tab 11-2-1 F_vs_G pathway 差异分析 reporter_score
Pathway IDPathwayReporterScoreSigFGKO Number...
ko00230Purine metabolism0.038570961581914504ns0.0169646158529058250.01734851922630174837930...
ko02010ABC transporters-1.988922338037288*0.0171474605148684770.01623682540391421433855...
ko00240Pyrimidine metabolism0.9393183011863767ns0.014810740297128020.01526998029730840833616...
ko02020Two-component system-0.03572494608188855ns0.0135876772014080560.01404620659375128827270...
ko02024Quorum sensing-0.9630553611131618ns0.012389523047851250.0121874692724872424892...
ko00520Amino sugar and nucleotide sugar metabolism-0.9956493455424241ns0.010680694132256840.01058477799804174124328...
ko00500Starch and sucrose metabolism0.07717280429956615ns0.010189902397163270.01167332563522108423047...
ko03440Homologous recombination2.7197871018379054*0.009883891722382570.00925677365730461820545...
ko00970Aminoacyl-tRNA biosynthesis3.2822402463154177*0.0088811984878580560.00860302336655716620316...
ko03430Mismatch repair1.5135848191578274ns0.0082749380849269760.008052620450507917826...

我们挑选显著富集的20条代谢通路,绘制柱形图展示如下:

  • F_vs_G

Fig 11-2-1 Reporter Score 柱形图


我们挑选显著富集的20条代谢通路,绘制气泡图展示如下:

  • F_vs_G

Fig 11-2-2 Reporter Score 气泡图


11.3 物种差异 LEfSe 分析

通过LDA Effect Size(LefSe)分析组间菌群差异,可以找出各组间特异的主要菌群,有助于开发biomarker等研究。根据物种(基因、功能)在各层级的丰度信息(物种在至少一个样本的相对丰度达到0.1%以上),我们开展LefSe分析。

结果在文件夹:08.Different/LefSe

  • F-VS-G

Fig 11-3-1 lefse 物种差异分析


11.4 功能差异 LEfSe 分析

类似的,我们可以将LEfSe的分析方法应用于功能差异,分析原理与物种LEfSe分析一致。我们使用LEfSe软件,针对KEGG、eggNOG、CAZy等主要数据库开展高丰度top200功能的差异分析。图形的分枝树,代表不同功能层级,如KEGG数据库,由内至外,对应Level A、B、C。以KEGG示例如下:

  • F-VS-G

Fig 11-4-1 lefse 功能差异分析


广州基迪奥生物科技有限公司




12 软件版本和参考文献汇总

Tab 12-0-1 软件版本和参考文献
分析软件/方法功能版本参考文献
FastpIllumina测序数据校正version 0.18.0Chen S, Zhou Y, Chen Y, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor[J]. bioRxiv, 2018: 274100.
MEGAHITreads组装version 1.1.2Li D, Liu C M, Luo R, et al. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph[J]. Bioinformatics, 2015, 31(10): 1674-1676.
MetaGeneMark基因预测version 3.38Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences[J]. Nucleic acids research, 2010, 38(12): e132-e132.
CD-HIT基因聚类version 4.6Fu L, Niu B, Zhu Z, et al. CD-HIT: accelerated for clustering the next-generation sequencing data[J]. Bioinformatics, 2012, 28(23): 3150-3152.
Bowtiereads比对contig、基因version 2.2.5Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2[J]. Nature methods, 2012, 9(4): 357.
基因丰度计算计算公式Qin J, Li Y, Cai Z, et al. A metagenome-wide association study of gut microbiota in type 2 diabetes[J]. Nature, 2012, 490(7418): 55-60.
DIAMOND基因比对数据库version 0.9.24Buchfink B, Xie C, Huson D H. Fast and sensitive protein alignment using DIAMOND[J]. Nature methods, 2015, 12(1): 59.
kaiju基于reads的物种注释version 1.6.3Menzel P, Ng K L, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju[J]. Nature communications, 2016, 7: 11257.
MEGAN基于基因物种注释软件MEtaGenome Analyzer Huson, D.H., Mitra, S., Ruscheweyh, H.-J., Weber, N., and Schuster, S.C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Res 21, 1552-1560.
LCA算法基于基因的物种注释算法Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007). MEGAN analysis of metagenomic data. Genome Res 17, 377-386.
Python的scikit-bio包alpha多样性指数version 0.5.6http://scikit-bio.org/docs/latest/diversity.html#module-skbio.diversity
metastats物种biomarkerversion 20090414White, James Robert, Niranjan Nagarajan, and Mihai Pop. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5.4 (2009): e1000352.
LEfSe物种biomarkerversion 1.0Segata, Nicola, et al. Metagenomic biomarker discovery and explanation. Genome biology 12.6 (2011): 1.
circoscircos图version 0.69-3Krzywinski M, Schein J, Birol I, et al. Circos: an information aesthetic for comparative genomics[J]. Genome research, 2009, 19(9): 1639-1645.
R语言VennDiagram包venn图 Chen H, Boutros P C. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R[J]. BMC bioinformatics, 2011, 12(1): 35.
R语言Vegan包Bray距离计算/PCA/PCoA/NMDS/UPGMA/Anosim/Adonis/welch's t检验/方差分析/Tukey HSD Oksanen J, Blanchet F G, Kindt R, et al. Vegan: community ecology package. R package version 1.17-4[J]. http://cran. r-project. org>. Acesso em, 2010, 23: 2010.
R语言ggplot2包PCA/PCoA/NMDS/小提琴图/盒形图 Wickham H, Chang W. ggplot2: An implementation of the Grammar of Graphics[J]. R package version 0.7, URL: http://CRAN. R-project. org/package= ggplot2, 2008, 3.

广州基迪奥生物科技有限公司




13 数据库汇总

Tab 13-0-1 数据库汇总
数据库功能版本注释软件注释参数链接
KEGG京都基因与基因组百科全书20200416Diamondevalue<=1e-5http://www.genome.jp/kegg/
eggNOG基因功能注释5.0.0Diamondevalue<=1e-5http://eggnog5.embl.de/#/app/home
PHI病原宿主互作数据库4.8Diamondevalue<=1e-5http://www.phi-base.org/
VFDB细菌毒力因子数据库2020.04.17Diamondevalue<=1e-5http://www.mgc.ac.cn/VFs/main.htm
CARD细菌耐药基因数据库3.0.8Diamondevalue<=1e-5http://arpcard.mcmaster.ca
CAZy碳水化合物酶数据库20190808Diamondevalue<=1e-5http://www.mgc.ac.cn/VFs/main.htm
Nr物种注释数据库20190205Kaiju默认https://www.ncbi.nlm.nih.gov/refseq/

广州基迪奥生物科技有限公司




14 目录结构


result                                           结果目录
├── 01.QC                                   质控目录
│   ├── 1_Filter_fq/sample/*.new.png(pdf)             过滤后碱基分布图(矢量图)
│   ├── 1_Filter_fq/sample/*.old.png(pdf)             过滤前碱基分布图(矢量图)
│   ├── 1_Filter_fq/filter.stat.fill.png(pdf)                           数据预处理分布图(百分比)
│   ├── 1_Filter_fq/filter.stat.count.png(pdf)                           数据预处理分布图(数值)
│   ├── 1_Filter_fq/reads_filter.stat.xls                            所有样品过滤信息总表
│   ├── 1_Filter_fq/reads_filter.stat.xls                            所有样品过滤信息总表
│   ├── 1_Filter_fq/reads_info.stat.xls                          所有样品过滤前后碱基质量信息
│   ├── 2_rHost/filter_host.stat.xls                             所有样品宿主过滤信息总表
│   └── 2_rHost/filter_host.stack.png(pdf)                       所有样本宿主过滤分布图
├── 02.Assemble                                       组装结果
│   ├── sample/sample.contigs.length_distribution.png(pdf)                         各样本组装结果统计位图(矢量图)
│   └── assem.contigs.stat.txt                             所有样本组装结果统计
├── 03.Genes                                        基因预测
│   ├── Unigenes.final.fna(faa)                             非冗余基因核酸(蛋白)序列
│   ├── Unigenes.final.gff                             非冗余基因gff文件
│   ├── Unigenes.expression.final.xls                   非冗余基因表达量总表
│   ├── Unigenes.count.final.xls                             非冗余基因counts数
│   ├── Unigenes.abundance.final.xls                             非冗余基因丰度
│   ├── bar*                             各样本非冗余基因数目柱状图
│   ├── violin*                             各分组非冗余基因数目小提琴图
│   └── Core_Pan/*                           Core-Pan分析结果目录
├── 04.Annotation                                          基因注释
│   ├── KEGG/*                             KEGG数据库注释结果
│   ├── eggNOG/*                             eggNOG数据库注释结果
│   ├── CAZy/*                             CAZy数据库注释结果
│   ├── CARD/*                             CARD数据库注释结果
│   ├── VFDB/*                             VFDB数据库注释结果
│   └── PHI/*                             PHI数据库注释结果
├── 05.Taxonomy                                          物种注释
│   ├── profiling.all.xls                                  各lineage比对丰度信息
│   ├── profiling.all.readnumber.xls                       各lineage比对reads number统计
│   ├── profiling.all.relative.xls                         各lineage比对相对丰度统计
│   ├── profiling.L*.*.xls                                 各层级比对相对丰度统计
│   ├── profiling.all.stat.xls                             各层级比对上的reads数目统计
│   ├── stack_plot/*                                       各层级比对信息堆叠图
│   ├── heatmap/*                                          各层级比对信息热图
│   ├── krona/*                                            krona图形化展示结果
│   └── circos/*                                           各层级比对记过Circos图展示
├── 06.Comparison                                          比较分析
│   ├── Venn/*                                  各层级Venn/UpSet结果
│   ├── Heatmap/*                               各层相关性热图结果
│   ├── PCA/*                                   各层级PCA分析结果
│   ├── PCoA/*                                  各层级PCoA分析结果
│   ├── NMDS/*                                  各层级NMDS分析结果
│   ├── UPGMA/*                                 各层级UPGMA分析结果
│   └── Anosim_and_Adonis/*                     各层级Anosim/Adonis分析结果
├── 07.Different                                           差异分析
│   ├── T_test/*                                 各层级各比较组Welch’s T分析结果
│   ├── ANOVA/*                                  各层级各比较组ANOVA分析结果
│   ├── Ternary/*                                各层级各比较组三元图分析结果
│   ├── MetaStats/*                              各层级各比较组MetaStats分析结果
│   └── LefSe/*                                  各比较组LefSe分析结果
├── index.html                                     结题报告
└── src                                            结果报告内容
    ├── content.html                                 结题报告主体
    ├── css                                            结题报告js脚本
    ├── doc                                            结题报告说明文档
    ├── image                                          结题报告图片
    └── js                                             结题报告js脚本

广州基迪奥生物科技有限公司




15 附录

15.1 问题解答

15.2 英文方法

15.3 中文实验方法

15.4 引用与致谢

如果您的研究课题使用了基迪奥的测序和分析服务,我们期望您在论文发表时,在Method部分或Acknowledgements部分引用或提及基迪奥公司。
以下语句可供参考:
We are grateful to/thank Guangzhou Genedenovo Biotechnology Co., Ltd for assisting in sequencing and/or bioinformatics analysis.

广州基迪奥生物科技有限公司




帮助文档