Classic Computer Science Problems in ̶P̶y̶t̶h̶o̶n̶ Scala — Trivial Compression

I recently purchased the book Classic Computer Science Problems in Python to help me brush up on my python. As I’m working through the Python code in the book, I’m taking some time to also port the Python to Scala. Below is an example the illustrates trivial compression for efficiently storing genes. A little science refresher, genes are typically made of Five nucleobases — adenine (A), cytosine ©, guanine (G), thymine (T), and uracil (U). Uracil (U) and thymine (T) are very similar except uracil is found in RNA, thymine (T) replaces uracil in DNA.

Python

In this code, each nucleotide is converted to its corresponding 2-bit representation and prepended to bit_string . To decode it, we reverse the mapping; we decode each 2-bit chunk of the bit_string back to the corresponding character. Python makes this possible as its integer type just keeps expanding and can be as large as you need it to be. We can effectively add bits to it until our computer runs out of memory.

# Usage
from sys import getsizeof
original: str = "TAGGGATCTAACCGTTATATATATATAGCCATGGATCGATTATATAGGG" * 100
print("original is {} bytes".format(getsizeof(original)))compressed: CompressedGene = CompressedGene(original)print("compressed is {} bytes".format(getsizeof(compressed.bit_string)))print(compressed)print("original and decompressed are same size: {}".format(original == compressed.decompress()))

Scala

In Scala, I’m using essentially the same type of compression. Instead of a magically expanding integer value, like Python has, Scala provides a BitSet that stores arbitrary sequences of bits. A BitSet is an interesting class. You flip a bit by using its add method. For example, bitSet.add(1) flips the bit at position 1 in the BitSet. You then can query if a bit is flipped or not with the apply method, e.g. bitSet(1) == true. Note that in order to decode the bit set we have to track the number of nuleotides in our original DNA sequence. I use a case class to store the BitSet and nucleotide length with each other.

// Usage
val original = "TAGGGATCTAACCGTTATATATATATAGCCATGGATCGATTATATAGGG" * 100
val compressed = CompressedGene(original)println(compressed)

Polyglot coder. Deep-sea Researcher. Zazen aficionado. I think squids are pretty cool.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store