Code Link
The code for this project can be viewed on GitHub, and a more extensive writeup of some of the work can be found on this pdf.
In spring of 2021, I took a class on database systems. This class covered various topics around designing different types of databases, and some of the advantages/disadvantages of each, but at the end of the course, one option for the final project was to do a performance based competition, where we try to optimize a program to run certain queries efficiently.
The database system was built to handle a simplified version of SQL, and the program we had, as expected, was mainly bottlenecked by joins in the query. We employed a variety of optimizations, and tried many optimizations that didn’t work, but overall most of the gains came from parallelizing query steps (for instance, with self joins, each thread could run a piece of the join, and then the results could be merged together, with the merging also being parallelized).