it2051229 Analyzing Search Algorithms

Overview

This project is designed to help you develop the skills nescessary to write and analyze algorithms. We will be using search algorithms which are relatively simple to understand and apply what we have learned about measuring time and measuring comparisons.

Your goal is to validate the hypothesis that binary search is more efficient than sequential search both in terms of runtime and number of comparisons. Your validation will be purely analytical and based on data you will generate and report on.

Getting Started

You will write all the code needed for this lab from scratch. That said, feel free to reference code and helper functions that have been demonstrated in class, and re-write these functions into your own project code base.

Requirements

1. Write a sequential search algorithm

Implement your own sequential search algorithm that operates on a std::vector< int> and finds an int in the list. This algorithms should have three parameters: a begin iterator referencing the first element in the range to search through, an end iterator such that end-- would reference the last element in the range to search through, and a callable function that will be used to check for equality. The algorithm should return an iterator referencing the element in the container if the element is found in the container, otherwise it should return the end iterator that was passed into the function.

This will be very similar to the function we wrote in class, except for the third parameter. Rather than taking an int, you should take a callable function. This function should take a const int& as a parameter, and return true if the parameter matches the item you are looking for, and false otherwise. Rather than using the == to check if an item in the list matches the item you are looking for, you will invoke the function and pass the item in the list that you want to check. This behavior is similar to how the std::find_if algorithm works and is discussed in the class lectures.

2. Write a binary search algorithm

Implement your own binary search algorithm that operates on std::vector< int> and finds an int in the list. This algorithm should have the same signature as the above sequential search algorithm - that is, it should take three parameters: a begin iterator referencing the first item in the search range, an end iterator such that end-- references the last element in the search range, and a callable function. The binary search algorithm should also return an iterator referencing the item in the list if found, and the end iterator passed in to the function otherwise.

The callable function behaves almost the same as the above sequential search algorithm. It should be invoked on each element in the list to check for to the item you are searching for. However, this function has a slightly different behavior becuase you will need to check not only for equality, but also for less-than and greater-than. The function should take a const int& as a parameter, this will be the item in the list to check. The function should return an int as a result with the following characteristics: the return value is -1 if the item in the list is the item you are looking for, the return value is 0 if the item in the list is to the item you are looking for, and the return value is 1 if the item in the list is the item you are looking for. You will be able to use this callable function to determine which sub-list to narrow your search down to, just like we did in the class lecture.

3. Measure runtime

Write code inside of main that measures runtime of the following four algorithms:

1. Your own sequential search algorithm
2. Your own binary search algorithm
3. The standard library std::find_if algorithm which is a sequential search
4. The standard library std::binary_search algorithm. Note that this function takes four parameters, the third parameter is the value to search for, and the fourth parameter is a comparison function. You can look up the requirements in CPPReference.

You will need to generate various sized lists of random integers as explained below and also pick a random integer value to search for in the list before calling the functions.

Please carefully follow these instructions for generating the lists:

You need to test many different list sizes. Start with a list of size 1000, and increment that number by 1000 until you reach 1000000 (one million). You should not do this manually, write a for-loop that does this for you.
For each of these list sizes, create a std::vector< int> and fill it with that many random integer values. The of these integer values should be an order of magnitude more than the size of the list. I.e. if the size of the list is 1000, then the numbers in the list should range from 0 to 10000. If the list size is 9000 then the numbers should range between 0 and 90000.
- Note that you will want to make sure that when you are calling the binary search algorithms, this list is sorted. When you are calling the sequential search algorithms, the list should remain unsorted.
Now that you have a list, you are ready to measure the runtime of the search algorithms. Generate a valid random number that could possibly be in the list. If the list size is 2000 then the random number should fall in the range 0 - 20000 just like the numbers in the list itself. Now invoke each of the four search algorithms such that they search for the number.
When invoking each search algorithm, measure the average runtime after at least five invocations of the function. This means that you should invoke each algorithm at least five times, and average the measured runtime for each algorithm and then report that number.

All of these requirements can be written using a few nested for-loops. Please don't try to do this manually where you change a value, then re-compile and re-run the code and record the results. That would end up taking an extremely long time. Write a program that does all of this for you, and then just let the program run. The program will likely take several minutes to run.

4. Measure comparisons

When you run all of these algorithms, you will take advantage of the fact that the third parameter is a callable function that you provide. You can use this to keep track of the number of times that function is called.

For each of the above list sizes in addition to measuring the runtime, measure the number of comparisons that the search algorithm uses by creating a counter variable and incrementing it on each comparison.

5. Produce a graph of your data

For all of the data you generate (runtime data and comparison count data) keep track of the results. You can print out all of the results to a file in order to have access to it when the program is finished running.

Here is an example of what the raw data looks like printed to the console:

Size MyFindTime MyFindIter StdFindT
1000 5.26e-05 1000 1.638e-05
2000 0.0001002 2000 3.136e-05
3000 0.00014832 3000 4.468e-05

I recommend that instead of printing to the console, you print to a file. You can print using the CSV format. This will make it easy to import the data into a spreadsheet and create a graph of the data.

Here is an example of the data I generated printed to a file in CSV format:

Size,MyFindTime,MyFindIter,StdFindTime,StdFindIter,MyBinTime,MyBinIter,S
1000,5.302e-05,1000,1.662e-05,1000,3.22e-06,9,2.5e-06,11
2000,0.00011298,2000,3.39e-05,2000,6.94e-06,10,3.48e-06,12
3000,0.0002073,3000,6.712e-05,3000,4.86e-06,12,3.18e-06,11
4000,0.00025662,4000,8.664e-05,4000,4.8e-06,12,3.3e-06,11
5000,0.0003302,5000,0.00012046,5000,5.5e-06,13,3.32e-06,12
6000,0.00048316,6000,0.00014118,6000,6.84e-06,13,4.34e-06,12

Once you have the data, you need to create two graphs of the results. One graph for runtime, and one graph for comparisons. The hould represent the list size, and the should represent the number of seconds for the first graph, and the number of comparisons for the second graph. On the graph, there should be a different line for each algorithm.

You can open the CSV data in a spreadsheet program. You can use Excel, or Google docs, or any of several other free spreadsheet programs. If you have the CSV data formatted as shown above, then it should work well.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.