CSP: Digital Information

Lossless Compression

based on resources from code.org

PBL by Silver Oaks

Food for thought

As you have been creating images over the last few units, the number of bits it takes to represent that information has grown and grown. In this Unit, you are introduced to the concept of compression as a way to address the growing file sizes of all of our information.
This follwoing list represents several common abbreviations used in text messages.
What other abbreviations could you add to this list?
  • LOL
  • TC
  • GN8

Why might we use abbreviations when sending messages? What are the advantages?

Intro

Text Compression

I want to send this message to a friend, but their phone can only accept 80 characters of text at a time. 

Pitter_patter_pitter_patter_listen_to_the_rain_pitter_patter_pitter_patter_on_the_window_pane

I notice this pattern has some repetition in it, so rather than sending the whole message, I send this instead:

listen_to_the_rain_on_the_window_pane

Using abbreviations and symbols is a form of compression, where we try to represent the same information with fewer characters. The original message had 93 characters, but the new message and key, also called a dictionary, have a total of 56 characters. We’re essentially sending the same information, but with fewer characters. Our goal today will be to create our own text compressions using similar methods.

Activity

Text Compression

 

The compression percentage at the bottom of the screen is calculated by comparing the number of bytes in the original message and the number of bytes in the compressed message.

  • Choose various text options available
  • Try compressing the text
  • Take screenshot once you are done
  • Paste the screenshot in google slide or doc and eexplain
    “What strategies you are using to compress your sample text? Which ones seem most successful?”
  • Submit your google slide or doc file as a part of Google Classroom Assighnment.

You will eventuallty reach the ‘limit’ for how much we can compress a particular message. But not every message can be compressed with a high rating. 

Think what makes some messages more compressable than others?

Wrapup

Reflection

What made some messages “easier” to compress than others? What made some messages more “difficult” to compress than others?
There are many strategies we can use when creating lossless compressions and there isn’t a single best way to do it. Instead, our compression rate usually depends on which strategy we choose and the patterns in the text we’re compressing. Most importantly, even though the number of bytes is getting smaller, we’re never actually losing information – we can always perfectly recreate the original message using our dictionary key.
  • ‘Easier’ texts usually had lots of repetition – repeated words or phrases or syllables. A useful strategy is to use this repetition to create the compression.
  • ‘Difficult’ texts usually have less repetition, making it less likely to apply this particular method of compression. Some strategies may actually make compression worse, which can be counter-intuitive

Lossless Compression:  A process for reducing the number of bits needed to represent something without losing any information. This process is reversible.