03. January 2021
BioPython snippets
Biopython stuff
These scripts are classified into different usage scenario. Some of them are easier to understand while not so efficient.
Fasta IO
1# read fasta file
2from Bio import SeqIO
3seq_dict = SeqIO.to_dict(SeqIO.parse(file_name, "fasta"))
4
5SeqIO.write(r_gene_seq_list, "r_gene_aa_v1_frozen.fa", "fasta")
6
Blast
need Blast+ installed
1# blast
2output_file = f'r_gene_{genome_name}.xml'
3blastp_cline = NcbiblastpCommandline(query="r_gene_merge.fa", db=work_db, evalue=0.001, outfmt=5, out=output_file)
4print(blastp_cline)
5stdout, stderr = blastp_cline()
6
7# parse blast
8from Bio import SearchIO
9blast_qresult = SearchIO.parse(xml_file, 'blast-xml')
10for qresult in blast_qresult:
11# print(len(qresult))
12 for hit in qresult:
13 hit_list.append(hit.id)
HMMER parsing
1from Bio import SearchIO
2blast_qresult = SearchIO.parse(xml_file, 'hmmer3-tab') # or hmmer3-domtab
3
4
ref:
https://biopython.org/docs/1.75/api/Bio.SearchIO.HmmerIO.html