# Tools¶

pyGeno provides a set of tools that can be used independentely. Here you’ll find goodies for translation, indexation, and more.

## Progress Bar¶

pyGeno’s awesome progress bar, with logging capabilities and remaining time estimation.

class tools.ProgressBar.ProgressBar(nbEpochs=-1, width=25, label='progress', minRefeshTime=1)[source]

A very simple unthreaded progress bar. This progress bar also logs stats in .logs. Usage example:

p = ProgressBar(nbEpochs = -1)
for i in range(200000) :
p.update(label = 'value of i %d' % i)
p.close()


If you don’t know the maximum number of epochs you can enter nbEpochs < 1

close()[source]

Closes the bar so your next print will be on another line

log()[source]

logs stats about the progression, without printing anything on screen

saveLogs(filename)[source]

dumps logs into a nice pickle

update(label='', forceRefresh=False, log=False)[source]

the function to be called at each iteration. Setting log = True is the same as calling log() just after update()

## Useful functions¶

This module is a bunch of handy bioinfo functions.

tools.UsefulFunctions.complement(seq)[source]

returns the complementary sequence without inversing it

tools.UsefulFunctions.complementTab(seq=[])[source]

returns a list of complementary sequence without inversing it

tools.UsefulFunctions.decodePolymorphicNucleotide(nuc)[source]

the opposite of encodePolymorphicNucleotide, from ‘R’ to [‘A’, ‘G’]

tools.UsefulFunctions.decodePolymorphicNucleotide_str(nuc)[source]

same as decodePolymorphicNucleotide but returns a string instead of a list, from ‘R’ to ‘A/G

tools.UsefulFunctions.encodePolymorphicNucleotide(polySeq)[source]

returns a single character encoding all nucletides of polySeq in a single character. PolySeq must have one of the following forms: [‘A’, ‘T’, ‘G’], ‘ATG’, ‘A/T/G’

tools.UsefulFunctions.findAll(haystack, needle)[source]

returns a list of all occurances of needle in haystack

tools.UsefulFunctions.getNucleotideCodon(sequence, x1)[source]

Returns the entire codon of the nucleotide at pos x1 in sequence, and the position of that nocleotide in the codon in a tuple

tools.UsefulFunctions.getSequenceCombinaisons(polymorphipolymorphicDnaSeqSeq, pos=0)[source]

Takes a dna sequence with polymorphismes and returns all the possible sequences that it can yield

tools.UsefulFunctions.highlightSubsequence(sequence, x1, x2, start=' [', stop='] ')[source]

returns a sequence where the subsequence in [x1, x2[ is placed in bewteen ‘start’ and ‘stop’

tools.UsefulFunctions.polymorphicCodonCombinaisons(codon)[source]

Returns all the possible amino acids encoded by codon

tools.UsefulFunctions.reverseComplement(seq)[source]

Complements a DNA sequence, returning the reverse complement.

tools.UsefulFunctions.reverseComplementTab(seq)[source]

Complements a DNA sequence, returning the reverse complement in a list to manage INDEL.

tools.UsefulFunctions.showDifferences(seq1, seq2)[source]

Returns a string highligthing differences between seq1 and seq2:

• Matches by ‘-‘
• Differences : ‘A|T’
• Exceeded length : ‘#’
tools.UsefulFunctions.translateDNA(sequence, frame='f1', translTable_id='default')[source]

Translates DNA code, frame : fwd1, fwd2, fwd3, rev1, rev2, rev3

tools.UsefulFunctions.translateDNA_6Frames(sequence)[source]

returns 6 translation of sequence. One for each reading frame

## Binary sequences¶

To encode sequence in binary formats

class tools.BinarySequence.AABinarySequence(sequence)[source]

A binary sequence of amino acids

class tools.BinarySequence.BinarySequence(sequence, arrayForma, charToBinDict)[source]

A class for representing sequences in a binary format

decode(binSequence)[source]

decodes a binary sequence to return a string

encode(sequence)[source]

Returns a tuple (binary reprensentation, default sequence, polymorphisms list)

find(strSeq)[source]

returns the first occurence of strSeq in self. Takes polymorphisms into account

findAll(strSeq)[source]

Same as find but returns a list of all occurences

findAllByBiSearch(strSeq)[source]

Same as find but returns a list of all occurences

findByBiSearch(strSeq)[source]

returns the first occurence of strSeq in self. Takes polymorphisms into account

findPolymorphisms(strSeq, strict=False)[source]

Compares strSeq with self.sequence. If not ‘strict’, this function ignores the cases of matching heterozygocity (ex: for a given position i, strSeq[i] = A and self.sequence[i] = ‘A/G’). If ‘strict’ it returns all positions where strSeq differs self,sequence

getDefaultSequence()[source]

returns a default version of sequence where only the last allele of each polymorphisms is shown

getNbVariants(x1, x2=-1)[source]

returns the nb of variants of sequences between x1 and x2

getPolymorphisms()[source]

returns all polymorphsims in the form of a dict pos => alleles

getSequenceVariants(x1=0, x2=-1, maxVariantNumber=128)[source]

returns the sequences resulting from all combinaisons of all polymorphismes between x1 and x2. The results is a couple (bool, variants of sequence between x1 and x2), bool == true if there’s more combinaisons than maxVariantNumber

class tools.BinarySequence.NucBinarySequence(sequence)[source]

A binary sequence of nucleotides

## Segment tree¶

Segment trees are an optimised way to index a genome.

class tools.SegmentTree.SegmentTree(x1=None, x2=None, name='', referedObject=[], father=None, level=0)[source]

Optimised genome annotations. A segment tree is an arborescence of segments. First position is inclusive, second exlusive, respectively refered to as x1 and x2. A segment tree has the following properties :

• The root has no x1 or x2 (both set to None).
• Segment are arrangend in an ascending order
• For two segment S1 and S2 : [S2.x1, S2.x2[ C [S1.x1, S1.x2[ <=> S2 is a child of S1

Here’s an example of a tree :

• Root : 0-15
• —->Segment : 0-12
• ——->Segment : 1-6
• ———->Segment : 2-3
• ———->Segment : 4-5
• ——->Segment : 7-8
• ——->Segment : 9-10
• —->Segment : 11-14
• ——->Segment : 12-14
• —->Segment : 13-15

Each segment can have a ‘name’ and a ‘referedObject’. ReferedObject are objects are stored within the graph for future usage. These objects are always stored in lists. If referedObject is already a list it will be stored as is.

emptyChildren()[source]

Kills of children

flatten()[source]

Flattens the tree. The tree become a tree of depth 1 where overlapping regions have been merged together

getFirstLevel()[source]

returns a list of couples (x1, x2) of all the first level indexed regions

getIndexedLength()[source]

Returns the total length of indexed regions

getX1()[source]

Returns the starting position of the tree

getX2()[source]

Returns the ending position of the tree

insert(x1, x2, name='', referedObject=[])[source]

Insert the segment in it’s right place and returns it. If there’s already a segment S as S.x1 == x1 and S.x2 == x2. S.name will be changed to ‘S.name U name’ and the referedObject will be appended to the already existing list

insertTree(childTree)[source]

inserts childTree in the right position (regions will be rearanged to fit the organisation of self)

intersect(x1, x2=None)[source]

Returns a list of all segments intersected by [x1, x2]

move(newX1)[source]

Moves tree to a new starting position, updates x1s of children

removeGaps()[source]

Remove all gaps between regions

tools.SegmentTree.aux_insertTree(childTree, parentTree)[source]

This a private (You shouldn’t have to call it) recursive function that inserts a child tree into a parent tree.

tools.SegmentTree.aux_moveTree(offset, tree)[source]

This a private recursive (You shouldn’t have to call it) function that translates tree(and it’s children) to a given x1

## Secure memory map¶

A write protected memory map.

class tools.SecureMmap.SecureMmap(filename, enableWrite=False)[source]

In a normal mmap, modifying the string modifies the file. This is a mmap with write protection

forceSet(x1, v)[source]

Forces modification even if the mmap is write protected