PyGeno comes with a set of parsers that you can use independentely.
To read and write CSV files.
- class tools.parsers.CSVTools.CSVEntry(csvFile, lineNumber=None)[source]¶
A single entry in a CSV file
- class tools.parsers.CSVTools.CSVFile(legend=[], separator=',', lineSeparator='\n')[source]¶
Represents a whole CSV file:
#reading f = CSVFile() f.parse('hop.csv') for line in f : print(line['ref']) #writing, legend can either be a list of a dict {field : column number} f = CSVFile(legend = ['name', 'email']) l = f.newLine() l['name'] = 'toto' l['email'] = "" for field, value in l : print(field, value)'myCSV.csv')
- closeStreamToFile()[source]¶
Appends the remaining commited lines and closes the stream. If no stream is active, raises a ValueError
- commitLine(line)[source]¶
Commits a line making it ready to be streamed to a file and saves the current buffer if needed. If no stream is active, raises a ValueError
- parse(filePath, skipLines=0, separator=',', stringSeparator='"', lineSeparator='\n')[source]¶
Loads a CSV file
- streamToFile(filename, keepInMemory=False, writeRate=1)[source]¶
Starts a stream to a file. Every line must be committed (l.commit()) to be appended in to the file.
If keepInMemory is set to True, the parser will keep a version of the whole CSV in memory, writeRate is the number of lines that must be committed before an automatic save is triggered.
- exception tools.parsers.CSVTools.EmptyLine(lineNumber)[source]¶
Raised when an empty or comment line is found (dealt with internally)
- tools.parsers.CSVTools.catCSVs(folder, ouputFileName, removeDups=False)[source]¶
Concatenates all csv in ‘folder’ and wites the results in ‘ouputFileName’. My not work on non Unix systems
To read and write FASTA files.
To read and write FASTQ files.
- class tools.parsers.FastqTools.FastqEntry(ident='', seq='', plus='', qual='')[source]¶
A single entry in the FastqEntry file
- class tools.parsers.FastqTools.FastqFile(fil=None)[source]¶
Represents a whole Fastq file:
#reading f = FastqFile() f.parse('hop.fastq') for line in f : print(line['sequence']) #writing, legend can either be a list of a dict {field : column number} f = CSVFile(legend = ['name', 'email']) l = f.newLine() l['name'] = 'toto' l['email'] = ""'myCSV.csv')
To read GTF files.
- class tools.parsers.GTFTools.GTFFile(filename, gziped=False)[source]¶
This is a simple GTF2.2 (Revised Ensembl GTF) parser, see for more infos
- get_transcripts(transcript_ids=None)[source]¶
returns genes with its transcripts and associated exons and CDSs from a GTF if transcript_ids is used, only these transcripts will be returned
- gtf2bed(bed_filename, feature='transcripts')[source]¶
Transform gtf to bed6/bed12 and saves the output to file
- gtf2bed_cds(bed_filename, join_overlaps=True)[source]¶
Retrieves CDS information from gtf in bed6 format
To read VCF files.
- class tools.parsers.VCFTools.VCFEntry(vcfFile, line, lineNumber)[source]¶
A single entry in a VCF file
- class tools.parsers.VCFTools.VCFFile(filename=None, gziped=False, stream=False)[source]¶
This is a small parser for VCF files, it should work with any VCF file but has only been tested on dbSNP138 files. Represents a whole VCF file:
#reading f = VCFFile() f.parse('hop.vcf') for line in f : print(line['pos'])