Parsers¶

PyGeno comes with a set of parsers that you can use independentely.

CSV¶

To read and write CSV files.

class tools.parsers.CSVTools.CSVEntry(csvFile, lineNumber=None)[source]¶

A single entry in a CSV file

commit()[source]¶: commits the line so it is added to a file stream

class tools.parsers.CSVTools.CSVFile(legend=[], separator=',', lineSeparator='\n')[source]¶

Represents a whole CSV file:

#reading
f = CSVFile()
f.parse('hop.csv')
for line in f :
        print(line['ref'])

#writing, legend can either be a list of a dict {field : column number}
f = CSVFile(legend = ['name', 'email'])
l = f.newLine()
l['name'] = 'toto'
l['email'] = "hop@gmail.com"

for field, value in l :
        print(field, value)

f.save('myCSV.csv')             

addField(field)[source]¶: add a filed to the legend

closeStreamToFile()[source]¶: Appends the remaining commited lines and closes the stream. If no stream is active, raises a ValueError

commitLine(line)[source]¶: Commits a line making it ready to be streamed to a file and saves the current buffer if needed. If no stream is active, raises a ValueError

insertLine(i)[source]¶: Inserts an empty line at position i and returns it

newLine()[source]¶: Appends an empty line at the end of the CSV and returns it

parse(filePath, skipLines=0, separator=',', stringSeparator='"', lineSeparator='\n')[source]¶: Loads a CSV file

save(filePath)[source]¶: save the CSV to a file

streamToFile(filename, keepInMemory=False, writeRate=1)[source]¶

Starts a stream to a file. Every line must be committed (l.commit()) to be appended in to the file.

If keepInMemory is set to True, the parser will keep a version of the whole CSV in memory, writeRate is the number of lines that must be committed before an automatic save is triggered.

toStr()[source]¶: returns a string version of the CSV

exception tools.parsers.CSVTools.EmptyLine(lineNumber)[source]¶: Raised when an empty or comment line is found (dealt with internally)

tools.parsers.CSVTools.catCSVs(folder, ouputFileName, removeDups=False)[source]¶: Concatenates all csv in ‘folder’ and wites the results in ‘ouputFileName’. My not work on non Unix systems

tools.parsers.CSVTools.joinCSVs(csvFilePaths, column, ouputFileName, separator=',')[source]¶: csvFilePaths should be an iterable. Joins all CSVs according to the values in the column ‘column’. Write the results in a new file ‘ouputFileName’

tools.parsers.CSVTools.removeDuplicates(inFileName, outFileName)[source]¶: removes duplicated lines from a ‘inFileName’ CSV file, the results are witten in ‘outFileName’

FASTA¶

To read and write FASTA files.

class tools.parsers.FastaTools.FastaFile(fil=None)[source]¶

Represents a whole Fasta file:

#reading
f = FastaFile()
f.parseFile('hop.fasta')
for line in f :
        print(line)

#writing
f = FastaFile()
f.add(">prot1", "MLPADEV")
f.save('myFasta.fasta')

add(header, data)[source]¶: appends a new entry to the file

get(i)[source]¶: returns the ith entry

parseFile(fil)[source]¶: Opens a file and parses it

parseStr(st)[source]¶: Parses a string

reset()[source]¶: Erases everything

save(filePath)[source]¶: saves the file into filePath

toStr()[source]¶: returns a string version of self

FASTQ¶

To read and write FASTQ files.

class tools.parsers.FastqTools.FastqEntry(ident='', seq='', plus='', qual='')[source]¶: A single entry in the FastqEntry file

class tools.parsers.FastqTools.FastqFile(fil=None)[source]¶

Represents a whole Fastq file:

#reading
f = FastqFile()
f.parse('hop.fastq')
for line in f :
        print(line['sequence'])

#writing, legend can either be a list of a dict {field : column number}
f = CSVFile(legend = ['name', 'email'])
l = f.newLine()
l['name'] = 'toto'
l['email'] = "hop@gmail.com"
f.save('myCSV.csv')

add(fastqEntry)[source]¶: appends an entry to self

get(li)[source]¶: returns the ith entry

newEntry(ident='', seq='', plus='', qual='')[source]¶: Appends an empty entry at the end of the CSV and returns it

parseFile(fil)[source]¶: Parses a file on disc

parseStr(st)[source]¶: Parses a string

reset()[source]¶: Frees the file

GTF¶

To read GTF files.

class tools.parsers.GTFTools.GTFFile(filename, gziped=False)[source]¶

This is a simple GTF2.2 (Revised Ensembl GTF) parser, see http://mblab.wustl.edu/GTF22.html for more infos

get(line, elmt)[source]¶: returns the value of the field ‘elmt’ of line ‘line’

get_transcripts(transcript_ids=None)[source]¶: returns genes with its transcripts and associated exons and CDSs from a GTF if transcript_ids is used, only these transcripts will be returned

gtf2bed(bed_filename, feature='transcripts')[source]¶: Transform gtf to bed6/bed12 and saves the output to file

gtf2bed_cds(bed_filename, join_overlaps=True)[source]¶: Retrieves CDS information from gtf in bed6 format

gtf2bed_exons(bed_filename, join_overlaps=True)[source]¶: Retrieves exon information from gtf in bed6 format

gtf2bed_transcripts(bed_filename)[source]¶: Retrieves transcript information from gtf in bed12 format

VCF¶

To read VCF files.

class tools.parsers.VCFTools.VCFEntry(vcfFile, line, lineNumber)[source]¶: A single entry in a VCF file

class tools.parsers.VCFTools.VCFFile(filename=None, gziped=False, stream=False)[source]¶

This is a small parser for VCF files, it should work with any VCF file but has only been tested on dbSNP138 files. Represents a whole VCF file:

#reading
f = VCFFile()
f.parse('hop.vcf')
for line in f :
        print(line['pos'])

close()[source]¶: closes the file

parse(filename, gziped=False, stream=False)[source]¶: opens a file

Casava¶

To read casava files.

class tools.parsers.CasavaTools.SNPsTxtEntry(lineNumber, snpsTxtFile)[source]¶: A single entry in the casavas snps.txt file

class tools.parsers.CasavaTools.SNPsTxtFile(fil, gziped=False)[source]¶

Represents a whole casava’s snps.txt file:

f = SNPsTxtFile('snps.txt')
for line in f :
        print(line['ref'])

reset()[source]¶: Frees the file