SummaryFor this project, you will write a Java program that processes several text files and builds an inverted index that stores a mapping from words to the documents (and locations within those documents) where those words were found. For example, suppose we have the following mapping stored in our inverted index: elephant → { ( mammals.txt, [ 3, 8 ] ),This indicates that the word Submission You must submit your project to your SVN repository at: https://www.cs.usfca.edu/svn/<username>/cs212/project1where You should include the following files in this directory:
To continue and make good progress in this class, you should aim to complete this project by the suggested deadline. To be eligible to still receive an "C" project grade in this course, you must submit this project by the listed cutoff date. These dates for this project are:
Recall that you may only have one project submitted at any time, and you must allow two weeks after your submission date for grading. ExecutionYour code must run on the lab computers. If you are developing your code on a home computer or laptop, be sure to check out your code on a lab computer and test it. Your svn export https://www.cs.usfca.edu/svn/<username>/cs212/project1cd project1java -cp project1.jar Driver <arguments>where
The output of your program should be a file elephantwhere the word is listed alone on a single line, followed by lines with the absolute file path, and a comma-separated list of locations. An empty line should separate entries. If there are any issues with your submission and running your code, you will be asked to resubmit. FunctionalityFor this program, you must traverse the supplied directory and process all *.txt files found in that directory (including all subdirectories). For each text file, you must parse out each word. For each word, you must store a mapping of the word to the file and position that word was found in a custom inverted index data structure. See below for details:
See the next section for recommendations on how to approach this project. Getting StartedIt is recommended you work on this project in two phases. For phase 1, you should:
For phase 2 of this project, you should:
Of course, this is just a recommendation. You should try to tackle all projects iteratively, but you can decide how best to break the project into parts. Project HintsHere are some recently posted hints to help you on this project: TestingYou should thoroughly test your own code. Make sure it meets the functionality requirements, performs proper exception and error handling, and produces the correct output. To assist you in your testing, several test files have been provided at the following location on the lab computers: Note that there are several subdirectories within the index directory above. You should test each of these subdirectories individually to start, and compare your results to the individual result files below. For example, test /home/sjengle/cs212/tests/index/subdir/simple first and compare your results with invertedindex.simple.txt.You should not submit your code until you are able to produce the test results. However, just because you produce the test results, does not mean your code is ready for submission. See the Projects page for what is required for the project to be considered "complete". Note: For this particular project, the order of output may differ. However, the words and positions should not. |
There is now a "Hints" section to help you as you develop your project. It will be updated periodically.