Parsers¶
PyGeno comes with a set of parsers that you can use independentely.
CSV¶
To read and write CSV files.
- class tools.parsers.CSVTools.CSVEntry(csvFile, lineNumber=None)[source]¶
A single entry in a CSV file
- class tools.parsers.CSVTools.CSVFile(legend=[], separator=',', lineSeparator='\n')[source]¶
Represents a whole CSV file:
#reading f = CSVFile() f.parse('hop.csv') for line in f : print(line['ref']) #writing, legend can either be a list of a dict {field : column number} f = CSVFile(legend = ['name', 'email']) l = f.newLine() l['name'] = 'toto' l['email'] = "hop@gmail.com" for field, value in l : print(field, value) f.save('myCSV.csv')
- closeStreamToFile()[source]¶
Appends the remaining commited lines and closes the stream. If no stream is active, raises a ValueError
- commitLine(line)[source]¶
Commits a line making it ready to be streamed to a file and saves the current buffer if needed. If no stream is active, raises a ValueError
- parse(filePath, skipLines=0, separator=',', stringSeparator='"', lineSeparator='\n')[source]¶
Loads a CSV file
- streamToFile(filename, keepInMemory=False, writeRate=1)[source]¶
Starts a stream to a file. Every line must be committed (l.commit()) to be appended in to the file.
If keepInMemory is set to True, the parser will keep a version of the whole CSV in memory, writeRate is the number of lines that must be committed before an automatic save is triggered.
- exception tools.parsers.CSVTools.EmptyLine(lineNumber)[source]¶
Raised when an empty or comment line is found (dealt with internally)
- tools.parsers.CSVTools.catCSVs(folder, ouputFileName, removeDups=False)[source]¶
Concatenates all csv in ‘folder’ and wites the results in ‘ouputFileName’. My not work on non Unix systems
FASTA¶
To read and write FASTA files.
FASTQ¶
To read and write FASTQ files.
- class tools.parsers.FastqTools.FastqEntry(ident='', seq='', plus='', qual='')[source]¶
A single entry in the FastqEntry file
- class tools.parsers.FastqTools.FastqFile(fil=None)[source]¶
Represents a whole Fastq file:
#reading f = FastqFile() f.parse('hop.fastq') for line in f : print(line['sequence']) #writing, legend can either be a list of a dict {field : column number} f = CSVFile(legend = ['name', 'email']) l = f.newLine() l['name'] = 'toto' l['email'] = "hop@gmail.com" f.save('myCSV.csv')
GTF¶
To read GTF files.
- class tools.parsers.GTFTools.GTFFile(filename, gziped=False)[source]¶
This is a simple GTF2.2 (Revised Ensembl GTF) parser, see http://mblab.wustl.edu/GTF22.html for more infos
- get_transcripts(transcript_ids=None)[source]¶
returns genes with its transcripts and associated exons and CDSs from a GTF if transcript_ids is used, only these transcripts will be returned
- gtf2bed(bed_filename, feature='transcripts')[source]¶
Transform gtf to bed6/bed12 and saves the output to file
- gtf2bed_cds(bed_filename, join_overlaps=True)[source]¶
Retrieves CDS information from gtf in bed6 format
VCF¶
To read VCF files.
- class tools.parsers.VCFTools.VCFEntry(vcfFile, line, lineNumber)[source]¶
A single entry in a VCF file
- class tools.parsers.VCFTools.VCFFile(filename=None, gziped=False, stream=False)[source]¶
This is a small parser for VCF files, it should work with any VCF file but has only been tested on dbSNP138 files. Represents a whole VCF file:
#reading f = VCFFile() f.parse('hop.vcf') for line in f : print(line['pos'])