Overview and Background

You will be provided with a large file containing thousands of records representing a warehouse’s inventory of current stock. The records are not currently in any particular order. Each record is identified by a unique key comprising of the category (table, chair, computer, etc) and a sequence number, eg: Computer:14. Each record contains both the key and information associated with the item (type, price, etc — see section 2.1 for details). Your task is to write a program that can load the records into memory and allow users to look up and display records via the item key (since the user knows the record key and wants to find out the associated information).

Since this is a unit on data structures, you will be required to implement two search methods based on two data structures. Initially, you are to load the records into an array, and the user may then request to search the array using a linear O(N) search. The user must also be able to build a binary search tree from the array and subsequently choose to search using the O(log N) find of a binary search tree. Finally, the user must be able to store the data to a file in key-sorted order.

When the program starts up, it should show the following main menu choices:

  • Load records into array
  • Build binary search tree from array
  • Search array
  • Search tree
  • Save data to file
  • Quit
  • Choice: >

The system should loop, requesting the user enter in their choice at which point the system executes the desired option (sometimes involving further user input) and loops back to re- display the main menu. If the user chooses to Quit the program should exit. Detailed specifications for the behaviour of each of the above options follows.

Specifications

LOAD RECORDS INTO ARRAY. On choosing Option 1, the user must be asked to enter in the path to the data ?le containing the records. Two example data files will be provided to you, a smaller one containing 1,000 records and a larger one containing a million records. However, although these files do not contain errors your code must still handle the possibility that the data gets corrupted and so must check for errors in formatting (see Input Validation below). Data ?les are in CSV format and each record is formatted as follows: ,,,,

  • Key - Structured as Category:Num, where Num is a sequence number
  • Brand - Manufacturer of the product
  • Model - Model name of the product
  • Weight in Kg - Weight of the product, rounded to the nearest kg
  • Price — Selling price of the product to 2dp.
  • Input validation: If the path and/or files of the corpus do not exist, an error should be shown to the user and the system should return to the main menu.

Input Validation

  • Input validation: WeightlnKg should be an integer
  • Input validation: Price should be a double
  • Ensure you close the file once finished.

DO NOT SORT THE ARRAY YET. It is important that you leave it in unsorted order for the binary search tree build (option 2). If you sort it, the tree will be degenerate and its search performance will be awful.

HINTS:

  • Create a class to store a single Warehouseltem. Then create an array of these objects to store the full list of these items.
  • You may assume that will never be more than 1,000,000 records
  • Don’t forget to keep track of the count of items as you are loading them: it is different to the capacity of the array!
  • In order to run the program for 1,000,000 items you will need to increase the size of memory that Java can use. To do this, run java with the -Xmx option, ie: $ java -Xmx500M MainClass

BUILD BINARY SEARCH TREE FROM ARRAY On selecting this option, the program must build a binary search tree of the data by inserting the items from the array one at a time into a new binary search tree.

  • Keys in the provided datasets are unique hence they will not cause duplicates. However your code must handle the possibility that non-unique keys exist (eg: corrupted data file).
  • The user should be able to choose this option multiple times and have it rebuild to binary search tree from scratch rather than append to an existing tree (the latter would fail with duplicate key errors)

SEARCH ARRAY (LINEAR SEARCH). On selecting this option, the program must ask the user for a key (CategoryName:ID) to search for. The program must then perform a linear search through the array looking for the record containing that key. If found, the Warehouse record should be displayed to the user; if not found an appropriate error message should be displayed to the user. In both cases, the time taken to perform the search (in milliseconds) must be displayed.

Ensure the record is displayed in a neat format with each field name indicated. Weight must have ‘kg’ appended, and price have ‘$’ prefixed. Display each field on its own line, ie:

  • Key: XXXXXXX
  • Brand: XXXXXXX
  • Model: XXXXXXX
  • Weight:XXkg
  • Price: $XXX.XX

Ensure the values (XXXX above) line up by padding spaces after the field names. Alternatively, you can use string formatting to do this for you (eg: to print the a string in a field 10 characters wide, use sPaddedStr = String.format(“%25s”, sStr); - extra chars will be spaces - see the JavaDocs on String.format for more information). Search must be case-SENSITIVE. Validation: The array must already be loaded.

HINTS:

  • To calculate the time taken to perform a search, use System.nanoTime(). This returns the current system time in nanoseconds, returning a ‘long’ variable (a ‘long’ is a 64-bit int).
  • So call System.nanoTime() just before you start the search (but after the user enters the search key) and store this value. Then immediately after the search completes, call System.nanoTime() again. The difference in the two values is the search time.
  • To convert nanoseconds to milliseconds, divide by 1000.0 (as a double). Display milliseconds to the full 3dp.

SEARCH TREE. On selecting this option, the program must ask the user for a key (CategoryName:|D) to search for. The program must then perform a search through the tree looking for the record containing that key. If found, the Warehouse record should be displayed to the user; if not found an appropriate error message should be displayed to the user. In both cases, the time taken to perform the search (in milliseconds) must be displayed.

  • As with searching the array, ensure the record is displayed in the same neat format
  • It would thus be a good idea to make a separate function that displays a given record!
  • Note that the tree will be roughly balanced rather than degenerate, since I have ensured that the provided datasets are randomly ordered. Hence search times will be O(log N) and you will n_ot need to implement complex rebalancing algorithms like Red-Black trees.

Do not forget to display the time taken to do the search. You should find that the search of the tree is much faster than the linear search of the array (especially with the million-item dataset). Validation: The tree must already be built

SAVE DATA TO FILE. When selecting this option, the user is asked for the name of the file to save the data. This must save data to a file in comma-separated format and in key-sorted ascending order. You have two options on how to do this:

  • Sort the array before saving it. You must use an O(N log N) algorithm to do this. An O(N2) algorithm will only get you half marks.
  • Perform an in-order traversal of the binary search tree (which will visit nodes in ascending key order) and write out the records in the order visited.

Validation: The data must already be loaded

REPORT. You are also required to submit a small ‘report’ (this may be submitted electronically). You must also provide a printed and signed assignment cover sheet (see unit pages for the cover sheet). The report is to have the following parts to it:

  • Simple UML class diagram showing all of your classes, their fields (names only), their methods (names only) and (optionally) the relationships between classes. Do not show Java's standard library classes since this will clutter up your diagram.
  • A list of known defects in your application
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.