In this lab, you’ll develop a program that manipulates bits of integers belonging to a vector
. You should already know what bits are, vaguely at least, but I will cover them in depth in this writeup to help you with the lab. Skip to this section if you know all about bits/bitwise stuff or just don’t want to read about it.
You can skip this section if you think you know what you’re doing in terms of bits. But I HIGHLY recommend reading through it all thoroughly – especially if you’ll be taking 230 next year, because you’ll thank me for having prepared you for it with the following information. Otherwise, skip to the Bitwise Operators section and/or Examples for a briefer explanation if you’re confident.
Every data type in your computer is stored in memory somehow. Where they’re stored is not exactly important right now, but how they’re stored is. Each data type has a specific amount of bytes that it takes up. A single byte is a unit comprised of 8 bits. Similar to how a meter is 100 centimeters, a byte is 8 bits. Bits are the lowest level unit in computing, and can be either 0
or 1
. A char
for example is 1 byte. That means it is 8 bits. So how do we represent a single char using bits? Well, binary (the “language” that uses bits) is in base-2. The decimal system for example is in base-10. I’m not going to get too much into it, but just pay attention to this next part.
0000 0000
The sequence of bits above is 8 bits in total (usually we’ll separate bits by every fourth bit just to make it cleaner) which means in total it makes up some form of data comprising one byte. A char
is one byte, so it could be represented by the above bits. If we chose to represent a char
using the bits above, it would be equal to the following \0
. This is known as the null terminating character. For char
’s, there is a table that maps a specific decimal value to it’s “letter” representation. This is known as an ASCII table. If all bits in a byte are 0, then its decimal equivalent is just 0. So looking at the ASCII table I just linked, we can see what 0 maps to and then we will understand why the char
is \0
. This is why a char
is one byte. The range of a typical ASCII table is [0, 127]
inclusive. If all 8 bits are set (more on this later), then the bits would look like
1111 1111
In this case, the decimal equivalent of those bits would be 255. A char
can only hold one byte of data, so the highest value it can contain is 255, right? Well actually, most ASCII characters are unsigned, which means they do not include negative numbers. A char
in C++ is implicitly signed. This means it can hold 127 negative values, and 127 positive values. So now you should see that’s where we get the range [0, 127]
All of this to say that you should understand every data type is just a certain amount of bytes, which is just a certain amount of sets of 8 bits. An int
is no different from a char
in terms of what it stores at the binary level. The only difference between them to your computer is the amount of bytes each can hold. It’s then up to the programming language to create rules that dictate what data types are comprised of and how they map into our reality. (e.g. char
is comprised of 1 byte and it maps to letters, int
is 4 bytes and it maps to integer values)
In this lab, you’ll be working primarily with int
’s so let’s discuss that. Here are some sequences of bits representing int
’s and their decimal equivalent
0000 0000 0000 0000 0000 0000 0000 0000 = 0 in dec
0000 0000 0000 0000 0000 0000 0000 0001 = 1 in dec
0000 0000 0000 0000 0000 0000 0000 0010 = 2 in dec
0000 0000 0000 0000 0000 0000 0001 0000 = 16 in dec
The pattern here is 2 to the power of the set bit’s index. A set bit is just a bit that is 1 instead of 0. That will give you the decimal equivalent of the binary representation. If there are more than 1 set bits, then you just need to calculate that bit’s value individually and then add up all of the values. So if you have int == 3
, it would look like
0000 0000 0000 0000 0000 0000 0000 0011
The first bit’s index is 0, and it is set. So 2^0 == 1
. Then the second bit is also set, and its index is 1. So 2^1
is 2
. 2 + 1 == 3
, which is how we calculate the value.
The task you’ve been given for this lab is to manipulate the bits of integers using bitwise operators. Bitwise operators are similar to regular operators like +,-,*,/, etc, except they work at the bit level and have slightly different rules.
Keep in mind that every bitwise operator can be mixed and matched with each other just like you would normal arithmetic. You can get really creative with them, and you’ll have to be a bit creative to solve some of this lab.
Here are the following bitwise operators you’ll be working with for C++
& - bitwise AND operator
| - bitwise OR operator
>> - bitshift right operator
<< - bitshift left operator
Ignoring what these do for now, similar to an expression a + 3
in C++, a & 3
does not modify a
. It hasn’t been stored anywhere. You’d have to do a += 3
to actually modify a
itself. It’s the same with bitwise operators a &= 3
.
&
– left & right
where each bit from right
is tested against each bit from left
. If both bits are 1, then the resulting bit is 1. Otherwise, the resulting bit is 0. result = left & right
would be the result of testing each individual bit from right to left e.g. 0100 0001
1100 0010 &
-------------
= 0100 0000
|
– left | right
where each bit from right
is set by each bit from left
. If either bit is 1, then the resulting bit is 1. Otherwise, the resulting bit is 0. result = left | right
would be the result of setting each individual bit from right to left e.g. 0100 0001
1100 0010 |
-------------
= 1100 0011
~
– ~a
where each bit from a
is flipped. If the bit is 1, then the resulting bit is 0. If the bit is 0, then the resulting bit is 1. result = ~left
would be the result of flipping each individual bit from right to left e.g. 0100 0001 ~
-------------
= 1011 1110
>>
– a >> amnt
shift a
’s bits to the right amnt
times. However many times a number is shifted, that amount of bits gets ejected from the sequence to the right, and that same amount of bits gets inserted to the left of the sequence as 0’s e.g. 0100 0001 >> 3
----------------
= 0000 1000
Chop 3 bits off the right, and insert 3 0’s on the left
<<
– a << amnt
shift a
’s bits to the left amnt
times. However many times a number is shifted, that amount of bits gets ejected from the sequence to the left, and that same amount of bits gets inserted to the right of the sequence as 0’s e.g. 0100 0001 << 3
----------------
= 0000 1000
Chop 3 bits off the left, and insert 3 0’s on the right
Before showing you some examples, you should understand the concept of bitmasking. All the bitwise operators I’ve shown you so far (except for the shift ones) are applied to every bit within a given data type. With the following code
1
2
3
4
int a = 8;
int b = 2;
int result = a & b;
// result == 0
All of the bits in a
are tested against all the bits in b
. Their bits look like
a == 0000 0000 0000 0000 0000 0000 0000 1000
b == 0000 0000 0000 0000 0000 0000 0000 0010
And the result of a & b
looks like
result == 0000 0000 0000 0000 0000 0000 0000 0000
Each bit index of a
is &
‘d against each corresponding bit index of b
. You’ll notice that none of the bits from a
line up with b
so that it ever does 1 & 1
, therefore the result for every bit in a & b
is 0, and so the final value of result
is 0.
As you can see from the example above, there’s no real purpose in just ANDing two random values. This is where bitmasks come in. A bit mask is just a number that we’ll use for specific purposes when using bitwise operatations against a specific value. So in the example above, instead of using 2
for b
, we would replace b
with a value that we intend to AND against be for a specific purpose. So if we wanted to test whether or not the 4th bit of a
was set for example, then we should set b
equal to a value that will return some non-zero number when using AND between a
and b
. So making b
a “bitmask” to do this, we just do
a == 0000 0000 0000 0000 0000 0000 0000 1000
b == 0000 0000 0000 0000 0000 0000 0000 1000
Now when we do a & b
, the result will be 8
. The example is a bit contrived, but hopefully that shows what a bitmask is and why you would want to use one. You curate a bitmask for whatever problem you’re trying to solve, and then you test that bitmask against a specific value you’re analyzing using any one of the bitwise operations that makes sense.
I’m going to show some examples here so you understand what’s happening at both the bit and the decimal level using each bitwise operator listed above in C++. (You can edit the following snippets and their inputs however you want and run the code yourself in the browser, btw)
Notice that the result does not change for any input to a
less than 255. It compares every set bit from b
(255 is 1111 1111
, so 8 bits of it are set) and then OR’s that with whatever a
is.
Notice that how ever many times an integer is shifted right, the resulting integer is original_number / 2^shiftamnt
. If the resulting integer is something like 1.6, the decimal is “truncated”, (rounded down basically), so 17 >> 1
would just be 8.
The behavior for this is the same as bitshift right, but instead the resulting integer in this case is original_number * 2^shiftamnt
.
Hopefully all of this gives you an idea of how to use each operator and what their functions are.
The examples above are a bit contrived so you can think for yourself during the actual lab. (hint: you’ll be using a combination of a bitmask, shifting, and &
/|
for most of the lab)
This lab implements a BITSET
class that allows you to create sets of integers that are represented as bitsets. The gist of the lab is that you will have a vector
containing a variable amount of int
’s that represent a bitset, which is simply the bit representation of a number. Luckily, the BITSET
class is mostly implemented for you, so you just have to focus on writing a few functions here and there.
More specifically, it’s important to understand that each mSets
element is an integer that you will perform bitwise operations on based off of a given
All you have to do is complete the parts of the code that are commented as
// TODO: ...
The big parts are the functions. I’ve listed the functions here for you to reference. The order they are listed in is the order you should implement them in.
1
2
3
4
5
6
ToBinary(int bit_set, int spacing) // returns a string representation of bitset with a space every spacing bits from right to left
BITSET::GetNumSets() // returns the number of sets in the BITSET object. This is just the size of the vector containing the int's
BITSET::GetSet(int index) // returns the int from mSets at index
BITSET::Test(int index) // "tests" (compares a bit like "bit & 1") the bit at index and returns true/false accordingly
BITSET::Set(int index) // "sets" (flips a bit from 0 to 1) the bit at index. Returns nothing.
BITSET::Clear(int index) // "clears" (flips a bit to 0) the bit at index. Returns nothing.
BITSET:GetSet()
should return 0 if the index given is out of the bounds of mSets
. This is because the BITSET
class is meant to be able to hold a variable amount of integers, and if you try to access an index that doesn’t exist, it should just return 0.
BITSET::Set()
should increase the number of bitsets in your mSets
vector accordingly. If you set the 32nd bit, then you should have 2 sets in your mSets
vector (each int
goes from 0th - 31st bit – 32 bits total). The first set should be 0
and the second set should be 1
. If you set the 66th bit, then you should have 3 sets in your mSets
vector. The first set should be 0
, the second set should be 0
, and the third set should be 4
. And so on. Even if your mSets
vector is empty, if the user sets the 66th bit, then the vector should resize until it can sufficiently hold the 66th bit. (there’s 32 bits in an int
in C++ on most systems, so you’ll need 3 int
’s to reach the 66 bits)
BITSET::Clear()
likewise, should decrease the number of int
’s in your mSets
vector. More specifically, it should “truncate” the vector to the smallest amount of int
’s that can hold all of the bits that are set. In other words, starting from the end of the vector, remove every element that is 0 until you reach a non-zero int
. So if your mSets
vector looks like this:
{ 1, 6, 9, 0, 0, 0, 1 }
The highest set bit is the 224th bit here (starting from 0). If you clear the 224th bit, then the vector should look like this:
{ 1, 6, 9 }
Notice all 0-elements between any non-zero elements are removed.
Abram has provided test scripts to make sure your code is working correctly, and there is a Makefile
included here that will make running the scripts and compiling your code easier.
As per usual, the tests provided are made for Unix systems, and therefore will not work on Windows unless running WSL/git-bash.
Abram will also be putting a version number on this in case anyone thinks of anything else in the next couple of days.
Before downloading the Makefile, I recommend you setup your lab directory to look something like this
As you can see, my directory is setup as ~/cs102/bitset
and contains a lab8.cpp
file. This should be your working directory for this lab, and it’s where you’ll run the rest of the commands from.
From this directory, you can now download the Makefile by running
1
curl -L https://raw.githubusercontent.com/Ethan0429/cs102-writeups/main/lab8/Makefile --output Makefile
or
1
curl -L https://utknotes.pisaucer.com/CS102/EthanNotes/Lab08/Makefile.txt --output Makefile
Similar to lab 7, the Makefile
for this lab provides a few commands that you can use to compile/test your code
NOTE: Keep in mind that each of these
make
commands (save formake clean
) will also compile your code for you, so you don’t have to runmake
every time you want to compile before a test.
compiles lab8.cpp
into lab8
, downloads bitset-tests.zip
and extracts them to the current directory once
make
runs all the tests using scripts/test.bash
& tests/in/
as input files & tests/out/
as output files
make tests
If your script fails, the script will diff
your output with the expected output and look like
> scripts/test.bash lab8.cpp
a1-cli.txt FAILED
YOURS EXPECTED
> 0 > 0
> > 0000_0000_0000_0000_0000_0000_0000_0001 | > > 0000 0000 0000 0000 0000 0000 0000 0001
> 0000_0000_0000_0000_0000_0000_0000_0001 | > 0000 0000 0000 0000 0000 0000 0000 0001
> > 1 > > 1
> Unknown command 'oops' > Unknown command 'oops'
The left column is your program’s output corresponding to the input file, and the right column is the solution output. The |
character denotes a line that is different between the two outputs.
If you pass all the scripts, the output should look like this
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
a1-cli.txt PASSED
b1-test.txt PASSED
b2-test-double.txt PASSED
c1-set.txt PASSED
c2-set-double.txt PASSED
d1-clear.txt PASSED
d2-clear-double.txt PASSED
e1-out-of-bounds.txt PASSED
f1-to-binary-formatting.txt PASSED
g1-grow1.txt PASSED
g2-grow2.txt PASSED
g3-grow3.txt PASSED
g4-grow4.txt PASSED
h1-shrink1.txt PASSED
h2-shrink2.txt PASSED
h3-shrink3.txt PASSED
h4-shrink4.txt PASSED
i1-to-binary-1.txt PASSED
i2-to-binary-2.txt PASSED
If you want to run one of the inputs without diffing, you can do so manually like this
1
make test CASE=<case>
where <case>
is the name of the input file you want to test with (e.g. a1-cli.txt
) – all of which can be found within the tests/in/
directory.
removes all test files & scripts
1
make clean
Let me know if you guys need any help setting these up!