Random Forests 隨機森林 | randomForest, ranger, h2o | R語言 (zhuantie)

https://www.jamleecute.com/random-forests-%E9%9A%A8%E6%A9%9F%E6%A3%AE%E6%9E%97/ Bagging法綜合多個樹模型結果,可以降低單一樹模型的高變異性並提升預測正確率。但Bagging法中樹與樹之間的相關性會降低模型整體的表現。隨機森林 Random forests 是Bagging修改後的版本,它是由「去相關性」的樹模型所組成的集成演算法,有很不錯的預測正確率且是一個受歡迎、開箱即用的演算法。

載入所需套件

Read more

[…]

MetaGEM:直接从宏基因组重建基因组规模的代谢模型

https://github.com/franciscozorrilla/metaGEM A Snakemake-based workflow to generate high quality metagenome assembled genomes from short read paired-end data, reconstruct genome scale metabolic models, and perform community metabolic interaction simulations on high performance computing clusters.[……]

Read more

[…]

「三代组装」使用Pilon对基因组进行polish

链接:https://www.jianshu.com/p/cceeb7d1f413来源

软件安装

如果要顺利运行程序,要求JAVA > 1.7, 以及根据基因组大小而定的内存,一般而言是1M大小的基因对应1GB的内存。

总览

Pilon有如下作用

  1. 对初步组装进行polish
  2. 寻找同一物种不同株系间的变异,包括结构变异检测

他以FASTA和BAM文件作为输入,根据比对结果对输入的参考基因组进行提高,包括

  • 单碱基差异
  • 小的插入缺失(indels)
  • 较大的插入缺失或者block替换时间
  • 填充参考序列中的N
  • 找到局部的错误组装[……]

Read more

[…]

batch download genome based on accession

Three easy ways to download multiple sequences from NCBI

There are different ways of how to download multiple sequences from the NCBI databases in a single request.   1) Using the batch Entrez website
http://www.>ncbi.nlm.nih.gov/sites/batchentrez

  2) Using Perl: (cop[……]

Read more

[…]

生信小工具专题:BBTools/BBMap Suite 的使用 (zhuantie)

链接:https://www.jianshu.com/p/175c3282a61c BBMap/BBTools是一种用于DNA和RNA测序reads的拼接感知全局的比对工具。它可以处理的reads包括,Illumina,454,Sanger,Ion Torrent,Pac Bio和Nanopore。 BBMap快速且极其准确,特别是对于高度突变的基因组或长插入读数,甚至超过100kbp长的全基因缺失。它对基因组大小或contigs数量没有上限。并且已经成功地用于绘制到具有超过2亿个contigs的85gb的土壤宏基因组。此外,与其他比对工具相比,索引阶段非常快。 BBMap[……]

Read more

[…]

how-to-extract-convert-gff3-cds-sequences-to-multifasta

https://bioinformatics.stackexchange.com/questions/2341/how-to-extract-convert-gff3-cds-sequences-to-multifasta

Using python and this GFF parser that mimics Biopython’s SeqIO parsers:
from BCBio import GFF # Read the gff for seq in GFF.parse('my_file.gff'): # only focus on t[......]

Read more

[…]

python and PBS script

#!/bin/bash #PBS -l walltime=48:00:00,nodes=8:ppn=4 #PBS -N bbmap_batch #PBS -l walltime=48:00:00 #megahit -1 /disk/rdisk09/zhiyshen/combined_HKG_1.fastq.gz -2 /disk/rdisk09/zhiyshen/combined_HKG_2.fastq.gz -m 0.9 -o /disk/rdisk09/zhiyshen/HKG_coassembly_out –min-contig-len 2000 -t 28 source a[……]

Read more

[…]

R heatmap for ANI

data<-read.table("fastani_matrix.txt", header=TRUE )

(base) zyshen@wyq-P310:~/work/deltaBS/fastaANI$ more fastani_matrix.txt B1147 E4385 E4742 E4930 EC5350 ROAR019 B1147 100 94.681488 94.696358 94.648102 99.785301 94.803284 E4385 94.681488 100 96.50248 97.408295 94.496964 95.691849 E[……]

Read more

[…]

Metagenomics Tutorial (HUMAnN2)

https://github.com/LangilleLab/microbiome_helper/wiki/Metagenomics-Tutorial-(Humann2)
This tutorial is set-up to walk you through the process of determining the taxonomic and functional composition of several metagenomic samples. It covers the use of Metaphlan2 (for taxonomic classification), HUM[……]

Read more

[…]

Metagenomics standard operating procedure v2

http://www.360doc.com/content/20/0823/10/71250389_931761136.shtml https://github.com/LangilleLab/microbiome_helper/wiki/Metagenomics-standard-operating-procedure-v2 Note that this workflow is continually being updated. If you want to use the below commands be sure to keep track of them locally[……]

Read more

[…]