logo

UTK Notes


Lab 8 - BITSET

Table of contents

README

In this lab, you’ll develop a program that manipulates bits of integers belonging to a vector. You should already know what bits are, vaguely at least, but I will cover them in depth in this writeup to help you with the lab. Skip to this section if you know all about bits/bitwise stuff or just don’t want to read about it.

Background

You can skip this section if you think you know what you’re doing in terms of bits. But I HIGHLY recommend reading through it all thoroughly – especially if you’ll be taking 230 next year, because you’ll thank me for having prepared you for it with the following information. Otherwise, skip to the Bitwise Operators section and/or Examples for a briefer explanation if you’re confident.

Bits & Bytes

Every data type in your computer is stored in memory somehow. Where they’re stored is not exactly important right now, but how they’re stored is. Each data type has a specific amount of bytes that it takes up. A single byte is a unit comprised of 8 bits. Similar to how a meter is 100 centimeters, a byte is 8 bits. Bits are the lowest level unit in computing, and can be either 0 or 1. A char for example is 1 byte. That means it is 8 bits. So how do we represent a single char using bits? Well, binary (the “language” that uses bits) is in base-2. The decimal system for example is in base-10. I’m not going to get too much into it, but just pay attention to this next part.

0000 0000

The sequence of bits above is 8 bits in total (usually we’ll separate bits by every fourth bit just to make it cleaner) which means in total it makes up some form of data comprising one byte. A char is one byte, so it could be represented by the above bits. If we chose to represent a char using the bits above, it would be equal to the following \0. This is known as the null terminating character. For char’s, there is a table that maps a specific decimal value to it’s “letter” representation. This is known as an ASCII table. If all bits in a byte are 0, then its decimal equivalent is just 0. So looking at the ASCII table I just linked, we can see what 0 maps to and then we will understand why the char is \0. This is why a char is one byte. The range of a typical ASCII table is [0, 127] inclusive. If all 8 bits are set (more on this later), then the bits would look like

1111 1111

In this case, the decimal equivalent of those bits would be 255. A char can only hold one byte of data, so the highest value it can contain is 255, right? Well actually, most ASCII characters are unsigned, which means they do not include negative numbers. A char in C++ is implicitly signed. This means it can hold 127 negative values, and 127 positive values. So now you should see that’s where we get the range [0, 127]

All of this to say that you should understand every data type is just a certain amount of bytes, which is just a certain amount of sets of 8 bits. An int is no different from a char in terms of what it stores at the binary level. The only difference between them to your computer is the amount of bytes each can hold. It’s then up to the programming language to create rules that dictate what data types are comprised of and how they map into our reality. (e.g. char is comprised of 1 byte and it maps to letters, int is 4 bytes and it maps to integer values)

In this lab, you’ll be working primarily with int’s so let’s discuss that. Here are some sequences of bits representing int’s and their decimal equivalent

0000 0000 0000 0000 0000 0000 0000 0000 = 0 in dec
0000 0000 0000 0000 0000 0000 0000 0001 = 1 in dec
0000 0000 0000 0000 0000 0000 0000 0010 = 2 in dec
0000 0000 0000 0000 0000 0000 0001 0000 = 16 in dec

The pattern here is 2 to the power of the set bit’s index. A set bit is just a bit that is 1 instead of 0. That will give you the decimal equivalent of the binary representation. If there are more than 1 set bits, then you just need to calculate that bit’s value individually and then add up all of the values. So if you have int == 3, it would look like

0000 0000 0000 0000 0000 0000 0000 0011

The first bit’s index is 0, and it is set. So 2^0 == 1. Then the second bit is also set, and its index is 1. So 2^1 is 2. 2 + 1 == 3, which is how we calculate the value.

Bitwise Operators

The task you’ve been given for this lab is to manipulate the bits of integers using bitwise operators. Bitwise operators are similar to regular operators like +,-,*,/, etc, except they work at the bit level and have slightly different rules.

Keep in mind that every bitwise operator can be mixed and matched with each other just like you would normal arithmetic. You can get really creative with them, and you’ll have to be a bit creative to solve some of this lab.

Here are the following bitwise operators you’ll be working with for C++

&  - bitwise AND operator
|  - bitwise OR operator
>> - bitshift right operator
<< - bitshift left operator

Ignoring what these do for now, similar to an expression a + 3 in C++, a & 3 does not modify a. It hasn’t been stored anywhere. You’d have to do a += 3 to actually modify a itself. It’s the same with bitwise operators a &= 3.

Bitwise AND

  • &left & right where each bit from right is tested against each bit from left. If both bits are 1, then the resulting bit is 1. Otherwise, the resulting bit is 0. result = left & right would be the result of testing each individual bit from right to left e.g.
  0100 0001
  1100 0010 &
-------------
= 0100 0000

Bitwise OR

  • |left | right where each bit from right is set by each bit from left. If either bit is 1, then the resulting bit is 1. Otherwise, the resulting bit is 0. result = left | right would be the result of setting each individual bit from right to left e.g.
  0100 0001
  1100 0010 |
-------------
= 1100 0011

Bitwise NOT

  • ~~a where each bit from a is flipped. If the bit is 1, then the resulting bit is 0. If the bit is 0, then the resulting bit is 1. result = ~left would be the result of flipping each individual bit from right to left e.g.
  0100 0001 ~
-------------
= 1011 1110

Bitshift Right

  • >>a >> amnt shift a’s bits to the right amnt times. However many times a number is shifted, that amount of bits gets ejected from the sequence to the right, and that same amount of bits gets inserted to the left of the sequence as 0’s e.g.
  0100 0001 >> 3
----------------
= 0000 1000

Chop 3 bits off the right, and insert 3 0’s on the left

Bitshift Left

  • <<a << amnt shift a’s bits to the left amnt times. However many times a number is shifted, that amount of bits gets ejected from the sequence to the left, and that same amount of bits gets inserted to the right of the sequence as 0’s e.g.
  0100 0001 << 3
----------------
= 0000 1000

Chop 3 bits off the left, and insert 3 0’s on the right

Bitmasking

Before showing you some examples, you should understand the concept of bitmasking. All the bitwise operators I’ve shown you so far (except for the shift ones) are applied to every bit within a given data type. With the following code

1
2
3
4
int a = 8;
int b = 2;
int result = a & b;
// result == 0

All of the bits in a are tested against all the bits in b. Their bits look like

a == 0000 0000 0000 0000 0000 0000 0000 1000
b == 0000 0000 0000 0000 0000 0000 0000 0010

And the result of a & b looks like

result == 0000 0000 0000 0000 0000 0000 0000 0000

Each bit index of a is &‘d against each corresponding bit index of b. You’ll notice that none of the bits from a line up with b so that it ever does 1 & 1, therefore the result for every bit in a & b is 0, and so the final value of result is 0.

As you can see from the example above, there’s no real purpose in just ANDing two random values. This is where bitmasks come in. A bit mask is just a number that we’ll use for specific purposes when using bitwise operatations against a specific value. So in the example above, instead of using 2 for b, we would replace b with a value that we intend to AND against be for a specific purpose. So if we wanted to test whether or not the 4th bit of a was set for example, then we should set b equal to a value that will return some non-zero number when using AND between a and b. So making b a “bitmask” to do this, we just do

a == 0000 0000 0000 0000 0000 0000 0000 1000
b == 0000 0000 0000 0000 0000 0000 0000 1000

Now when we do a & b, the result will be 8. The example is a bit contrived, but hopefully that shows what a bitmask is and why you would want to use one. You curate a bitmask for whatever problem you’re trying to solve, and then you test that bitmask against a specific value you’re analyzing using any one of the bitwise operations that makes sense.

Examples

I’m going to show some examples here so you understand what’s happening at both the bit and the decimal level using each bitwise operator listed above in C++. (You can edit the following snippets and their inputs however you want and run the code yourself in the browser, btw)

AND Example

https://onecompiler.com/embed/cpp/3xxnk5z7r?hideTitle=true&hideLanguageSelection=true&hideNew=true&hideNewFileOption=true&theme=dark

OR Example

Notice that the result does not change for any input to a less than 255. It compares every set bit from b (255 is 1111 1111, so 8 bits of it are set) and then OR’s that with whatever a is.

https://onecompiler.com/embed/cpp/3xxnntqz5?hideTitle=true&hideLanguageSelection=true&hideNew=true&hideNewFileOption=true&theme=dark

Bitshift Right Example

Notice that how ever many times an integer is shifted right, the resulting integer is original_number / 2^shiftamnt. If the resulting integer is something like 1.6, the decimal is “truncated”, (rounded down basically), so 17 >> 1 would just be 8.

https://onecompiler.com/embed/cpp/3xxnngh2j?hideTitle=true&hideLanguageSelection=true&hideNew=true&hideNewFileOption=true&theme=dark

Bitshift Left Example

The behavior for this is the same as bitshift right, but instead the resulting integer in this case is original_number * 2^shiftamnt.

https://onecompiler.com/embed/cpp/3ym378bjv?hideNew=true&hideLanguageSelection=true&hideTitle=true&hideNewFileOption=true&theme=dark

Hopefully all of this gives you an idea of how to use each operator and what their functions are.

The examples above are a bit contrived so you can think for yourself during the actual lab. (hint: you’ll be using a combination of a bitmask, shifting, and &/| for most of the lab)

The Lab

This lab implements a BITSET class that allows you to create sets of integers that are represented as bitsets. The gist of the lab is that you will have a vector containing a variable amount of int’s that represent a bitset, which is simply the bit representation of a number. Luckily, the BITSET class is mostly implemented for you, so you just have to focus on writing a few functions here and there.

More specifically, it’s important to understand that each mSets element is an integer that you will perform bitwise operations on based off of a given

Your Task

All you have to do is complete the parts of the code that are commented as

// TODO: ...

The big parts are the functions. I’ve listed the functions here for you to reference. The order they are listed in is the order you should implement them in.

1
2
3
4
5
6
ToBinary(int bit_set, int spacing) // returns a string representation of bitset with a space every spacing bits from right to left
BITSET::GetNumSets() // returns the number of sets in the BITSET object. This is just the size of the vector containing the int's
BITSET::GetSet(int index) // returns the int from mSets at index
BITSET::Test(int index) // "tests" (compares a bit like "bit & 1") the bit at index and returns true/false accordingly
BITSET::Set(int index) // "sets" (flips a bit from 0 to 1) the bit at index. Returns nothing.
BITSET::Clear(int index) // "clears" (flips a bit to 0) the bit at index. Returns nothing.

Hint for BITSET::Set()

Hint for BITSET::Test()

Hint for BITSET::Clear()

Caveats

  • BITSET:GetSet() should return 0 if the index given is out of the bounds of mSets. This is because the BITSET class is meant to be able to hold a variable amount of integers, and if you try to access an index that doesn’t exist, it should just return 0.

  • BITSET::Set() should increase the number of bitsets in your mSets vector accordingly. If you set the 32nd bit, then you should have 2 sets in your mSets vector (each int goes from 0th - 31st bit – 32 bits total). The first set should be 0 and the second set should be 1. If you set the 66th bit, then you should have 3 sets in your mSets vector. The first set should be 0, the second set should be 0, and the third set should be 4. And so on. Even if your mSets vector is empty, if the user sets the 66th bit, then the vector should resize until it can sufficiently hold the 66th bit. (there’s 32 bits in an int in C++ on most systems, so you’ll need 3 int’s to reach the 66 bits)

  • BITSET::Clear() likewise, should decrease the number of int’s in your mSets vector. More specifically, it should “truncate” the vector to the smallest amount of int’s that can hold all of the bits that are set. In other words, starting from the end of the vector, remove every element that is 0 until you reach a non-zero int. So if your mSets vector looks like this:

{ 1, 6, 9, 0, 0, 0, 1 }

The highest set bit is the 224th bit here (starting from 0). If you clear the 224th bit, then the vector should look like this:

{ 1, 6, 9 }

Notice all 0-elements between any non-zero elements are removed.


Testing

Abram has provided test scripts to make sure your code is working correctly, and there is a Makefile included here that will make running the scripts and compiling your code easier.

As per usual, the tests provided are made for Unix systems, and therefore will not work on Windows unless running WSL/git-bash.

Abram will also be putting a version number on this in case anyone thinks of anything else in the next couple of days.

Setup & Downloading Makefile

Before downloading the Makefile, I recommend you setup your lab directory to look something like this

lab8 directory

As you can see, my directory is setup as ~/cs102/bitset and contains a lab8.cpp file. This should be your working directory for this lab, and it’s where you’ll run the rest of the commands from.

From this directory, you can now download the Makefile by running

1
curl -L https://raw.githubusercontent.com/Ethan0429/cs102-writeups/main/lab8/Makefile --output Makefile

or

1
curl -L https://utknotes.pisaucer.com/CS102/EthanNotes/Lab08/Makefile.txt --output Makefile

Similar to lab 7, the Makefile for this lab provides a few commands that you can use to compile/test your code

NOTE: Keep in mind that each of these make commands (save for make clean) will also compile your code for you, so you don’t have to run make every time you want to compile before a test.

Compile & Download Test Cases

compiles lab8.cpp into lab8, downloads bitset-tests.zip and extracts them to the current directory once

make

Run Tests

runs all the tests using scripts/test.bash & tests/in/ as input files & tests/out/ as output files

make tests

If your script fails, the script will diff your output with the expected output and look like

> scripts/test.bash lab8.cpp
a1-cli.txt FAILED

YOURS                                         EXPECTED
> 0                                             > 0
> > 0000_0000_0000_0000_0000_0000_0000_0001   | > > 0000 0000 0000 0000 0000 0000 0000 0001
> 0000_0000_0000_0000_0000_0000_0000_0001     | > 0000 0000 0000 0000 0000 0000 0000 0001
> > 1                                           > > 1
> Unknown command 'oops'                        > Unknown command 'oops'

The left column is your program’s output corresponding to the input file, and the right column is the solution output. The | character denotes a line that is different between the two outputs.

If you pass all the scripts, the output should look like this

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
a1-cli.txt PASSED
b1-test.txt PASSED
b2-test-double.txt PASSED
c1-set.txt PASSED
c2-set-double.txt PASSED
d1-clear.txt PASSED
d2-clear-double.txt PASSED
e1-out-of-bounds.txt PASSED
f1-to-binary-formatting.txt PASSED
g1-grow1.txt PASSED
g2-grow2.txt PASSED
g3-grow3.txt PASSED
g4-grow4.txt PASSED
h1-shrink1.txt PASSED
h2-shrink2.txt PASSED
h3-shrink3.txt PASSED
h4-shrink4.txt PASSED
i1-to-binary-1.txt PASSED
i2-to-binary-2.txt PASSED

Run Individual Test Case

If you want to run one of the inputs without diffing, you can do so manually like this

1
make test CASE=<case>

where <case> is the name of the input file you want to test with (e.g. a1-cli.txt) – all of which can be found within the tests/in/ directory.

Clean

removes all test files & scripts

1
make clean

Let me know if you guys need any help setting these up!