A) (7 points) Write a program whose main routine obtains parameter n from the user (i.e., passed to your program when it was invoked from the shell, n>1). Your program shall then create a shared memory and a child process. The shared memory shall have a size of BUF_SZ*sizeof(long long), where BUF_SZ is defined as 5 using a macro, e.g. “#define BUF_SZ 5”. The child process should obtain the value of n from the parent (you actually have multiple options for doing that) and create (but not print) a Fibonacci sequence of length n and whose elements are of type (long long). You may find more information about Fibonacci numbers at (https://en.wikipedia.org/wiki/Fibonacci_number). The child process shall create the elements, one at a time, and wait for a random interval of time ( 0
A. (1 point) List two main differences between Fixed Size partitioning and paging B. (1 point) List two main differences between Variable Size partitioning and segmentation C. (5 point) Repeat assignment 5A, except that now you should NOT use the fork() system call but instead create two separate programs, a producer program and a consumer program, where each is then invoked from a separate command shell (you should pass the same parameters n to both programs when you invoke them). Make sure you create the variables storing the input parameter n in the data section (NOT in the stack) AND initialize them to a non-zero value. The shared memory shall be created using shm_open and both processes shall use a common file name, e.g. /lab9_shm so that both processes can easily find it. Using part C, continue with the following questions/tasks D. (0.5 points) Print the start address of the shared buffer form both processes. E. (0.5 points) Was the address printed virtual or physical address? F. (0.5 points) Print the address of n from your running program and also, G. (0.5 points) find out where it’s stored in the .elf file (executable). H. (0.5 points) (Did the addresses match (printed from the running program vs the one in the program’s elf file)? Why? I. (0.5 points) What is the virtual address of the entry point in your producer and consumer programs? (note that in most programs, some initialization is first invoked before calling “main()”). Hints: • To get addresses of variables from an elf file (your executable), you need to use: objdump –syms lab9 OR objdump -D lab9 OR readelf -all lab9 OR readelf -s lab9 where lab9 is the name of your executable. Note that objdump may not report variables mapped to the .bss section (i.e uninitialized variables → you must either make your variable initialized or use readelf). • Alternatively, you may tell the linker to output a map file using -Xlinker Map=lab9.map in your gcc command line. • You can find quick info about the ELF format in https://en.wikipedia.org/wiki/Executable_and_Linkable_Format and you may also parse the ELF file (i.e. the output of your compilation process) using the hexedit utility (which you can install in your system).What to submit: Please submit the following files individually: 1) Source file(s) with appropriate comments. The naming should be similar to “lab#_$.c” (# is replaced with the assignment number and $ with the question number within the assignment, e.g. lab4_b.c, for lab 4, question c OR lab5_1a for lab 5, question 1a). 2) A single pdf file (for images + report/answers to short-answer questions), named “lab#.pdf” (# is replaced by the assignment number), containing: • Screen shot(s) of your terminal window showing the current directory, the command used to compile your program, the command used to run your program and the output of your program. 3) Your Makefile, if any. This is applicable only to kernel modules. What to hand in (using Brightspace): • Source files (.c or .h) with appropriate comments. • Your Makefile if any. • A .pdf file named “lab#.pdf” (# is replaced by the assignment number), containing: o Screen shot(s) of your terminal window showing the current directory, the command used to compile your program, the command used to run your program and the output of your program. RULES: • You shall use kernel version 4.x.x or above. You shall not use kernel version 3.x.x. • You may consult with other students about GENERAL concepts or methods but copying code (or code fragments) or algorithms is NOT ALLOWED and is considered cheating (whether copied form other students, the internet or any other source). • If you are having trouble, please ask your teaching assistant for help. • You must submit your assignment prior to the deadline.
In this problem you will write your own hash function for std::string in the provided hash.h file. While std::strings can be of arbitrary length and contain any legal ASCII character, we will assume the largest string you will ever receive is 28 letters long (e.g. antidisestablishmentarianism) and only contains letters and digits (‘0’-‘9’). For the letters, you should treat upper-case letters the same as you would lower-case letters. The Approach The basic approach will be to treat groups of letters/digits as base-36 and convert to an integer value (i.e. the decimal equivalent). First translate each letter into a value between 0 and 35, where a=0, b=1, c=2, …, z=25 and digit ‘0’=26, … ‘9’=35. Be sure to convert an upper case letter to lower case before performing this mapping. Next you will translate a (sub)string of 6 letters a1 a2 a3 a4 a5 a6 into an (unsigned long long) 64-bit integer w[i] (essentially converting from base-36 to decimal), via the following mathematical formula: w[i]=365∗a1+364∗a2+363∗a3+362∗a4+36∗a5+a6 Place zeros in the leading positions if your (sub)string is shorter than 6 letters. So an example string like “104” would set a1, a2, and a3 all to zero (since we only have a length 3 string) and yield 362∗27+36∗26+30 (since ‘1’ : 27, ‘0’ : 26, and ‘4’ : 30) You should use the base conversion approach taught in class to avoid repeated calls to pow(). Note that indexing is a bit tricky here so think carefully. The digit at the “end” of the string is assigned the power 36^0, and the 2nd to last letter is worth 36^1, etc. If an input word is longer than 6 letters, then you should first do the above process for the last 6 letters in the word, then repeat the process for each previous group of 6 letters. Recall, you will never receive a word longer than 28 characters. The last group may not have 6 letters in which case you would treat it as a substring of less than 6 characters as described above. Since we have at most 28 characters this process should result in a sequence of no more than 5 integers: w[0] w[1] w[2] w[3] w[4], where w[4] was produced by the last 6 letters of the word. Store these values in an array (of unsigned long long). Place zeros in the leading positions of w[i] if the string does not contain enough characters to make use of those values. So for a string of 12 letters, w[0], w[1], and w[2] would all be 0 and only w[3] and w[4] would be (potentially) non-zero. We will now hash the string. Use the following formula to produce and return the final hash result using the formula below (where the r values are explained below). h(k)=(r[0]∗w[0]+r[1]∗w[1]+r[2]∗w[2]+r[3]∗w[3]+r[4]∗w[4]) where the r[i] values are either: ● 5 random numbers created when you instantiate the hash function using the current time as your seed (i.e. srand(time(0))). Be sure to #include and #include . ● 5 preset values for debugging (so we all get the same hash results): r[0]=983132572, r[1]=1468777056, r[2]=552714139, r[3]=984953261, r[4]=261934300 (these numbers were generated by random.org) Note: We will allow h(k) to produce a large integer (beyond the range of m, the table size) and only mod it by the table size in the hash table. The constructor of your hash object will take a debug parameter, which, if true should set the r values to the above debugging values, and if false should select the randomized values. Note: Make sure you compute using unsigned long long variables or cast (constants or other variables) to that type at the appropriate places so that you don’t have an overflow error. Testing We have provided a simple test program, str-hash-test.cpp where you can provide a string on the command line and it will hash the string and output the hash result. Below is some debug output of the w[i] values computed for several test strings using the debug r values. Below is the output for a few strings: ./str-hash-test abc yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 38 h(abc)=9953503400 ./str-hash-test abc123 yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 1808957 h(abc123)=473827885525100 ./str-hash-test B yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 0 w[4] = 1 h(B)=261934300 ./str-hash-test gfedcba yields w[0] = 0 w[1] = 0 w[2] = 0 w[3] = 6 w[4] = 309191940 h(gfedcba)=80987980279261566 ./str-hash-test antidisestablishmentarianism yields w[0] = 17540 w[1] = 195681115 w[2] = 2203855 w[3] = 732943745 w[4] = 484346964 h(antidisestablishmentarianism)=1137429692708383810 3. Hash table with linear and double hasing You will create a HashTable data structure that uses open-addressing (i.e. uses probing for collision resolution) and NOT chaining. Your table should support linear and double hasing probing via functors. It should support resizing when the loading factor is above a user-defined threshold. You may use the std::hash functor provided with the C++ std library as well as your own hash function for strings made of letters and digits only (no punctuation) that you created in an earlier problem. You should complete the implementation in ht.h. template< typename K, typename V, typename Prober = LinearProber, typename Hash = std::hash, typename KEqual = std::equal_to > class HashTable { /* Implementation */ }; The template types are as follows: ● K: the key type (just like a normal map) ● V: the value type (just like a normal map) ● Prober: a functor that supports an init() and next() function that the hash table can use to generate the probing sequence (e.g. linear, quadratic, double-hashing). A base Prober class has been written and a derived LinearProber is nearly complete. You will then write your own DoubleHashProber class to implement double-hash probing. The actual Prober type that is passed may contain other template arguments but must at least take the Key type as a template argument. The DoubleHashProber will take the key type AND the second hash function type (see below for more details). ● Hash: The primary hash function that converts a value of type K to a std::size_t which we have typedef’d as HASH_INDEX_T. ● KEqual: A functor that takes two K type objects and should return true if the two objects are equal and false, otherwise. Internally, we will use a hash table of pointers to HashItems (i.e. std::vector > ). The HashItem has the following members: typedef std::pair ItemType; struct HashItem { ItemType item; // key, value pair bool deleted; }; You should allocate a HashItem when inserting and free them at an appropriate time based on the description below. If no location is available to insert the item, you should throw std::logic_error, though since we are not using Quadratic probing and we resize the table when the loading factor is above a certain threshold, this should not happen. (Yet we include it in case we do implement quadratic probing in the future.) We have also provided a constant array of prime sizes that can be used, in turn, when resizing and rehashing is necessary. This sequence is: 11, 23, 47, 97, 197, 397, 797, 1597, 3203, 6421, 12853, 25717, 51437, 102877, 205759, 411527, 823117, 1646237, 3292489, 6584983, 13169977, 26339969, 52679969, 105359969, 210719881, 421439783, 842879579, 1685759167. Thus, when a HashTable is constructed it should start with a size of 11. The client will supply a loading factor, alpha, at construction, and before inserting any new element, you should resize (to the next larger size in the list above) if the current loading factor is at or above the given alpha. For removal of keys, we will use a deferred strategy of simply marking an item as “deleted” using a bool value. These values should not be considered when calls are made to find or operator[] but should continue to count toward the loading factor until the next resize/rehash. At that point, when you resize, you will only rehash the non-deleted items and (permanently) remove the deleted items. We will not ask you to implement an iterator for the hash table. So find() will simply return a pointer to the key,value pair if it exists and nullptr if it does not. To facilitate tracking relevant statistics we will use for performance measurements, we have provided the core probing routine: probe(key). This routine, applies the hash function to get a starting index, then initializes the prober and repeatedly calls next() until it finds the desired key, reaches a hashtable location that contains nullptr (where a new item may be inserted if desired), or reaches its upper limit of calls (i.e. cannot find a null location). probe() utilizes the Prober. Prober::init is called to give the prober all relevant info that it may need, regardless of the probing strategy (i.e. starting index, table size, the key (which would be needed by double-hash probing) etc.). Then a sequence of calls to Prober::next() will ensue. If, for example we are using linear probing, the first call to next() would yield h(k), the subsequent call to next() would yield h(k)+1, the third call would yield h(k)+2, etc. Notice: the first call to next() just returns the h(k) value passed to Prober::init(). As probing progresses we will update statistics such as the total number of probes. Some accessor and mutator functions are provided to access those statistics. Note: When probing to insert a new key, we could stop on a location marked as deleted and simply overwrite it with the key to be inserted. However, because our design uses the same probe(key) function for all three operations: insert, find, and remove, and certainly for find and remove we would NOT want to stop on a deleted item, but instead keep probing to find the desired key, we will simply take the more inefficient probing approach of not reusing locations marked for deletion when probing for insertion. A debug function, reportAll is also provided to print out each key value pair. Use this when necessary to help you debug your code. Probers We have abstracted the probing sequence so various strategies may be implemented without the hash table implementation being modified. Complete the LinearProber and then write a similar DoubleHashProber that uses double-hashing. Return npos after m calls to next() . Double Hash Prober Your double-hash prober should take in two template arguments: template struct DoubleHashProber : public Prober { /* Code */ }; The KeyType is the same as that of the hash table, while Hash2 should be a function object that implements a HASH_INDEX_T operator()(const KeyType& key) const. Our double-hash prober should use a stepsize of m2 – h2(k)%m2 where m2 is some prime moduli less than the current hash table size. Since our double hash prober needs this separate m2, we have a static array of moduli to use and provide a helper function to determine which moduli to use, given the hash table size, m passed to init(). The init() is complete but can be appended to, if need be. Testing Your Hash Table A simple test driver program ht-test.cpp has been provided which you may modify to test various aspects of your hash table implementation. We will not grade this file so use it as you please. BUT PLEASE do some sanity tests with it BEFORE using any of our test suite. 4. Boggle This part of the assignment is another problem that requires a recursive back-tracking approach to solve. Boggle is a word game where players roll a randomized square grid of letters and then race to find all of the words that can be spelled left-to-right, down or diagonal (down and to the right). The goal is to identify the most words. In our version of Boggle we are creating a randomized grid of letters (see boggle.cpp) using the Scrabble letter frequencies (truly randomized letters will form fewer words). In our version of Boggle we also want to search for and return only the longest word that starts at a particular position (return EARS not EAR). In order to facillitate this feature we provide a std::set made up of all of the word prefixes found in the dictionary. Using this you can tell that EAR is not only a terminal word, but is also a prefix for other words. We also provide a dictionary based on the Scrabble tournament word list (so no single letter words). See boggle.cpp for the code that creates both the prefix set and dictionary. boggle-driver.cpp is the main program for this problem and you should not need to change it (if you do make changes, make sure not to delete the code that outputs the words, as this is used by the autochecker). You execute boggle-driver with the board size, seed and dictionary file as command line arguments. We also give you boggle() in boggle.cpp. This function iterates over each position in the board, starting a new search at that location in the three required directions. Your task is to implement boggleHelper() in boggle.cpp to recursively search for words in the given board. The r and c arguments are the current position in the board. The dr and dc input argument set the search direction. You will not need to change dr or dc as you search, rather use them to set r and c as you recurse to the next position. Your solution must recursively build up a word along the direction of search and insert only the longest word found into the std::set result set. You’ll note that boggleHelper() is type bool, you will need to use this return value to help you insert only the longest word found. Your solution must be backtracking, i.e if adding a new letter along the search direction does not continue to make a longer word, the recursion must stop, though you may need to add a word to the result set at this point. You can use the std::set prefix input argument to see if a string is a prefix of a longer word or words. If you don’t use a back-tracking approach it is likely that your solution will be too slow to pass the submission tests. You can treat the board like a 2D array of char, accessing it like board[x][y]. See printBoard() in boggle.cpp. See the next page in the guide for some example executions. 5. Boggle Examples ./boggle-driver 3 999 dict.txt I R R U M A E D O Found 5 words: DO, ED, MA, MO, UM ./boggle-driver 4 11 dict.txt C I R N J J W I U A P E G D V N Found 6 words: AD, APE, EN, JUG, PE, WE ./boggle-driver 5 10 dict.txt I H U N D E I D N M E H A U R I R F C A D A I T N Found 13 words: AIT, AN, DA, EH, ER, HA, HI, HUN, ID, IT, NU, RAN, UN ./boggle-driver 7 1234 dict.txt D T E O A I V P E W I G A E S E A W H I I A A I C E E Z O S P W U P U G O N S F S E S O V K L T U Found 30 words: AA, AE, AG, AI, AS, AW, CEE, DE, EW, GAE, GI, GO, HE, HI, ICE, OI, ONS, OS, PE, PEW, PST, SAPS, SEA, SO, TE, TEE, UP, US, WE, WIG ./boggle-driver 10 104 dict.txt I E R T A I Q K A N S J I Z I A D C I E L E A E L P E O V I Z M R H H A W E E N Q L S D E O E N E S Q O E S A S T I S X L N T O C I L Z H L N I T O F R O A E I B R I Z N C V Q F W R O R Q Q E W Y O E Found 72 words: AA, AD, AE, AH, AI, AIL, AN, ARSE, AS, AWEE, BO, DA, DE, DEW, DOES, EF, EH, EL, EM, EN, ER, ES, ET, EW, EWE, EX, FE, FRO, HAW, HE, HES, HI, HOT, ID, INS, IS, IT, KA, KI, LA, LEA, LI, LO, NE, NIT, OE, OES, OF, ON, ONE, OR, OS, PA, PE, RIA, SER, SETT, SHE, SIR, SO, TA, TI, TIP, TIS, TO, TONE, VEES, WE, WEEN, WET, YO, ZA 6. Boggle Testing On this page you can run a checker to help you pass the autograder. The first few Boggle tests are from the example page. The others are “hidden” in that we don’t tell you the output list, just how many words are found. Boggle tests This autochecker will run your Boggle program with the following tests. For this first set, see the example page for the expected result. ./boggle-driver 3 999 dict.txt ./boggle-driver 4 11 dict.txt ./boggle-driver 5 10 dict.txt ./boggle-driver 7 1234 dict.txt ./boggle-driver 10 104 dict.txt The next 5 tests are “hidden”: ./boggle-driver 3 98765 dict.txt (must find 4 words) ./boggle-driver 5 314 dict.txt (must find 14 words) ./boggle-driver 7 8675309 dict.txt (must find 43 words) ./boggle-driver 9 4326872 dict.txt (must find 54 words) ./boggle-driver 11 88 dict.txt (must find 98 words) Check It!
In this assignment, we shall build on what we did in assignment 3 and further extend it. You shall write a dynamically loadable module that registers itself as a character device. When registering your kernel module as a character device, you need to provide some methods that allow a user-mode application to access your device via open(), read(), write() and ioctl(). You do so by implementing those methods (naming them as you wish), and providing a “file operations struct” that contains function pointers to them. This takes place during the registration of your character device. The file operations struct contains many function pointers, you should set all of them to NULL except the ones you are implementing, where you provide pointers to the actual functions you implemented. In this assignment, you shall implement the open() and the read() functions. In your implementation of the open() method, you should obtain the caller’s user ID (the caller is a user-mode process), and save it in a variable (should be a static global variable). In your implementation of the read() method (and again, you may name it as you wish), you shall provide the user ID to the read() caller (will be the same as the process that opened the virtual file of course). Hint: you should #include in your module’s source file. Lookup that file and find out the struct representing the PCB (struct task_struct). Typically, a user-mode application will open your “virtual” device, read from it (till it encounters and end-of-file character, EOF), then close it. Thus, when testing your module, you should create a simple user-mode program that opens your virtual device and reads from it. It may then compare the user ID it read from the driver against one you obtain using getuid(). Some Linux Notes: • Devices are organized as character, block or network devices. • Devices have major and minor numbers. The major number specifies the device type whereas the minor number specifies an instance of that type. As such, it is common that a single device driver takes care of an entire major number. • Most major numbers are already assigned to specific device drivers so you are not supposed to use them.• Linux uses the concept of virtual file systems, one of those is the devfs (whose “virtual” devices appear on /dev/) • Generally, 1) You driver registers itself with the devfs subsystem using a major number, and then 2) Create a node for it in the /dev directory (e.g. /dev/labx), using the mknod command. You must pass some parameters to the mknod command; the major number your driver used (that’s how you link /dev/labx virtual file to your driver, using the major number) and a minor number of your choice (for the instance). You also pass the type as “c” User-mode applications may then treat it as a file and file operations such as open(), fprintf(), read(), write(), etc. are routed to functions that your module handles. • Instead of using this complex registration and node creation process, we shall use the misc_register(), which handles both steps automatically. It registers us under device major number 10 (amongst with many other devices besides us using the same major number, where each has a different minor number): o You need to provide a minor number to misc_register(), the device’s name (“labx”) and a pointer to the file operations. o In order to avoid collisions, use the MISC_DYNAMIC_MINOR macro as the minor number when you register your device. • For information regarding misc_register(), see: http://www.linuxjournal.com/article/2920. • You may find further information at kernel.org or lwn.net • You should not directly use the pointers provided by the read() function, but instead you need to use: unsigned long copy_to_user(void __user *to, const void *from, unsigned long n); Brief (but not full) Summary of what you need to do: • Review what you did in assignment 3. • Read the article in http://www.linuxjournal.com/article/2920 to familiarize yourself with how to create and register a misc device. • Create an “open: function (e.g. name it misc_open() or any other name of your choosing ) and a “read” function (e.g. named misc_read()) o Please read the “hw8_24F_notes.pdf” document for further information. • Create a struct file_operations containing the function pointers to misc_open and misc_read you’ve created earlier, and containing null pointers for the remainder of the struct. o Consult with the “Linux Device Driver 3” book, page 50 for further info about the open and read functions o Further information is also in “hw8_24F_notes.pdf” • Create a struct miscdevice as per article 2920 above. • Create your init and exit functions (as you did in assignment 3). In the module init function, you use the misc_register() functionality to register your module as a character device, using the structs you’ve already created. • Build your kernel module as you did in assignment 3.What to submit: Please submit the following files individually: 1) Source file(s) with appropriate comments. The naming should be similar to “lab#_$.c” (# is replaced with the assignment number and $ with the question number within the assignment, e.g. lab4_b.c, for lab 4, question c OR lab5_1a for lab 5, question 1a). 2) A single pdf file (for images + report/answers to short-answer questions), named “lab#.pdf” (# is replaced by the assignment number), containing: • Screen shot(s) of your terminal window showing the current directory, the command used to compile your program, the command used to run your program and the output of your program. 3) Your Makefile, if any. This is applicable only to kernel modules. RULES: • You shall use kernel version 4.x.x or above. You shall not use kernel version 3.x.x. • You may consult with other students about GENERAL concepts or methods but copying code (or code fragments) or algorithms is NOT ALLOWED and is considered cheating (whether copied form other students, the internet or any other source). • If you are having trouble, please ask your teaching assistant for help. • You must submit your assignment prior to the deadline.
Spring 2025Consider the popular game (at least at the time of this writing) of Wordle. A 5-letter English word is chosen at random (though we will extend this to support n-letter words) and the user has some number of chances to guess the word. As the user guesses words, letters in their guess that match the selected word in the correct location are highlighted in green, while letters (which may include duplicates) that are part of the word but not in the correct location are highlighted in yellow. So, if the selected word was total and the user guessed treat, the first t and a in treat would be highlighted in green to indicate they are in the correct location and the last t would be marked in yellow to indicate that there is another t in the select word but not at the correct location, and all other letters (such as r) would appear in gray to indicate they are not found in the word. In this program, you will write a tool to aide players of a Wordle-like game (it does not have the exact semantics of Wordle, but is similar). Suppose the user would like to know what English-language words exist that meet certain criteria (so they can think of potential guesses). The user will provide you an input of how many letters the word is, what letters have already been guessed in their correct location (akin to the green letters), and a list of letters that they know MUST appear in the word (akin to the yellow letters, but without indicating where those letters CANNOT be). They will NOT provide the list of gray letters, meaning you will need to try all possible letters when trying to fill in a blank location in the word. So your task will be to write a function (and any desired helper functions), to compute all possible English-language words that exist given a string containing the already-guessed letters that are in the correct location (we will refer to these as fixed letters since they may not change position and must stay in that location) and a string of the already-guessed letters that are MUST appear in the word somewhere (we will refer to these as floating letters since their position may be in any of the remaining locations). Again, we are not actually writing code to play the game but code that can help players consider potential future guesses given some of the knowledge they’ve learned from previous guesses. Your approach should be to generate all possible strings of lower-case, alpha characters that contain the fixed haracters (in their given location), floating characters (in any locations), and fill in any remaining letters with all possible options and then check if any of those strings are true English-language words. We will provide a std::set of all possible English-language words (i.e. essentially a dictionary) that you may use for this task (which we read from a provided file: dict-eng.txt). The signature of the function you are to write must be (and cannot be changed): std::set wordle( const std::string& in, const std::string& floating, const std::set& dict); Note: We realize that this may be an inefficient approach to finding all the words that match the given constraints. If we were not forcing you to use this approach, it may be easier to find the words that start with a given letter and iterate through them in the sorted order that the set provides checking to see if the word matches the given constraints. However, one can see that if the first letter or two are unknown, then this may require iterating through the whole dictionary, and so, in some cases, may be faster to generate all the possible strings, as we do here. Input format To indicate how many letters the selected word has AND the correct, fixed location letters already guessed, we will use a string of length n (for an n-letter word), using the – character to indicate a blank location and characters filling in the correct, fixed locations. So the input string -i- means we want to find all 3 letter words that have an i as the middle character. We will then pass in a second string of the floating (yellow) characters that contains the characters (in any order) that we know must be part of the selected word but whose location is unknown. So if you were given a first input string of -i– and second input string dn, you should output all words that have i as the 2nd letter and contain d and n. A sample output is below. Since the results are stored in a set, they can be printed in any order. We have provided a “driver”/test program (wordle-driver.cpp) that will take in these two strings from the command line, call your function, and then print out the results from the set of strings returned. Program execution and command line arguments. ./wordle-driver -i– dn Printed results: bind dine ding dins dint find hind kind mind rind wind Use wordle-driver to do some sanity tests of your code before moving on to any of the tests from our grading suite. Note: To discourage any attempt to hard-code or game the system, the instructor may run additional tests after submission that were not provided, though they will be similar in format and content. Requirements and Assumptions ● As always you may not change the signature of the primary function provided. ● You MAY define helper functions in wordle.cpp. ● You MUST use a recursive approach to find all combinations of letters to form the length-n word. Failure to do so will lead to a 0 on this part of the assignment. ○ Really you should only have 1 or 2 loops to help set the characters in any given location, and maybe 1 to 2 other loops to help with various constraint checks, etc. But to ensure you do not somehow brute-force the solutions, you may use at most 5 loops in your entire implementation in wordle.cpp. ● You may NOT use any functions from the algorithm library (nor should you really need to). ● You do NOT need to implement any kind of backtracking approach where you attempt to determine if a string can possibly be an valid English-language word as you are forming it. Instead, just generate all possible strings and check if each word is in the dictionary once it has the full n letters. Hints and Approach Recall that when generating all combinations you use recursion to build up a combination 1 “place” at a time, with each recursion responsible for 1 “place/location” of the combination. For that place, each recursion should try all the viable “options”, recursing after each one, and, upon return, undo and try the next option if a solution has not been found (or if multiple solutions are desired). Think carefully about what “options” should be tried at each location. Can any letter be used at any open place? What limitaiton do the floating letters provide? By generating all combinations of n-length words that meet the restrictions given by the fixed and floating characters, it becomes simple to check whether the combination is a valid English-language word once the combination has its full n letters. Next3. Schedwork Suppose a company needs to schedule d (which stands for needed) out of k (possible) workers (e.g. nurses at a hospital) per day given their availability over an n day period. The company requires exactly d workers (nurses) per day but does not allow any individual to work more than m (maxShifts) shifts over the n day period. Given d, m, and the k workers’ availability for each of the n days, write a backtracking search function to schedule the nurses, such that exactly d work on each of the n days and no one works more than m shifts. The prototype for the function you will write is given below, along with some typedefs for the input and output data structures. // type for the ID of a worker typedef unsigned int Worker_T; // n-by-k Matrix of each of the k workers’ availability over an n-day period typedef std::vector AvailabilityMatrix; // n-by-d matrix with the d worker IDs who are scheduled // to work on each of the n days typedef std::vector DailySchedule; /** * @brief Produces a work schedule given worker availability, * the number of needed workers per day, and the maximum * shifts any single worker is allowed. Returns true if * and the valid schedule if a solution exists, and false * otherwise. * * @param [in] avail n x k vector of vectors (i.e. matrix) of the availability * of the k workers for each of the n days * @param [in] dailyNeed Number of workers needed per day (aka d) * @param [in] maxShifts Maximum shifts any worker is allowed over * the n day period (aka m) * @param [out] sched n x d vector of vectors indicating the d workers * who are scheduled to work on each of the n days * @return true if a solution exists; sched contains the solution * @return false if no solution exists; sched is undefined (can be anything) */ bool schedule( const AvailabilityMatrix& avail, const size_t dailyNeed, const size_t maxShifts, DailySchedule& sched ); Explanation The avail matrix is an n row by k column matrix of booleans. Each row represents one day of the schedule and each column is one worker’s availability to work on each day. W W W W o o o o r r r r k k k k e e e e r r r r 0 1 2 3 | | | | | | | | V V V V Day 0 –> 1 1 1 1 Day 1 –> 1 0 1 0 Day 2 –> 1 1 0 1 Day 3 –> 1 0 0 1 So we see in the avail matrix above that every worker is available to work on day 0 while only worker 0 and 2 are available on day 1, and so on. You should produce a schedule solution that is an n by d matrix, where each row represents a day in the schedule and stores the d IDs of the workers who have been scheduled to work that day (each entry must thus be a value in the range 0 to k-1). So given the availability matrix above and inputs of m=2 and d=2, a valid schedule results would be: 1 2 0 2 1 3 0 3 The above indicates that on day 0 (top row), worker 1 and 2 will be scheduled. Then on day 1 (next row), worker 0 and 2 will be scheduled and so on. You can verify with the availability matrix that all workers are available on their scheduled days and no worker works more than m=2 shifts. Testing We have provided a “driver”/test program (schedwork-driver.cpp) where you can alter an availability matrix and values for d (dailyNeed) and m (maxShifts) and then call your algorithm and print the results. Use schedwork-driver to do some sanity tests of your code before moving on to any of the tests from our grading suite. Requirements and Assumptions ● As always you may not change the signature of the primary function provided. ● You MAY define helper functions in schedworker.cpp ● You MUST use a recursive approach that follows the general backtracking structure presentedin class. Failure to use such a recursive approach will lead to a 0 on this part of the assignment. ● You MAY use functions from the algorithm library such as std::find, if you desire. ● The order in which you list the worker IDs in each row of the final schedule doesn’t matter (i.e. if Worker 1, 2, 3 is scheduled to work on a given day, then 3, 2, 1 is also acceptable). ● You may have up to three loops in your code: two for setup and one in your recursive search. Hints and Approach Recall that a backtracking search algorithm is a recursive algorithm that is similar to generating all combinations, but skipping the recursion and moving on to another option if the current option violates any of the constraints. It is likely easiest to recurse over each place in the schedule (i.e. the 2D sched matrix). Each recursive call would be responsible for filling in one of the n*d schedule openings, ensuring the constraints of availability and the maximum number of shifts allowed for each worker is met. If you have already done a lab regarding backtracking search, it would likely be beneficial to look it over. Next4. Submission Grading There are 26 tests between wordle_tests and schedwork_tests. No valgrind tests this time (you don’t need to dynamically allocate anything on this assignment).
The core of this assignment is to develop, train, test and evaluate a system that takes as input an image and its region proposals and produces as output a set of detected objects including their class labels and bounding boxes. At the heart of this is a neural network that takes a region proposal as input and predicts both a classification decision about the region and the bounding box for the region. The region proposals are generated for you. Once the predictions are made by the network for all proposals, the results must be post-processed to produce the actual detections. The classification decisions include the nothing class, which means there is no object in the region proposal. In this case, no bounding box will be extracted from your network. The possible classes are provided with the initial Python code we are giving you. Your goal is to maximize the mean Average Precision of the detections across the images. Importantly, you will not need to implement the network from scratch like you did for HW 5. Instead you can use a pretrained (”backbone”) network — in particular ResNet-18 — and add the classification and bounding box regression layers on top of it. Starter python code with the Dataset subclass, for designing your network, and for the postprocessing and evaluation steps are provided for you in the zip file distributed with the assignment. Training and Validation Input The input for training and for validation will be managed by the HW6Dataset class, a subclass of PyTorch’s Dataset. It implements the __init__, __len__ and __getitem__ methods, as discussed in the Lecture 19 notes. Each call to the latter returns a cropped candidate image region, the ground truth bounding box, and the ground truth label. Here are some details: • All regions will be the same size, 224×224. The regions will be cropped and resized from the image before they are given to you. Multiple regions will be pulled from each image. Forcing them to be a square will distort the region. Each region will be a 3x224x224 tensor. • The ground truth bounding box will be a 1×4 tensor of floats giving the upper left and lower right corners of the bounding box in the coordinate system of the rectangle. Importantly, these will be specified as fractions of the region width and height. For example,the tensor (0.1, 0.25, 0.8, 0.95) is the bounding box (22.4, 56., 179.2, 212.8). • The ground truth label is an integer, with the value 0 indicating that there is no correct label for this region. This is the ”nothing” class. Your data set object will be passed to a Dataloader which will return minibatches of 4-dimensional tensors of size Bx3x224x224, where B is the size of the minibatch. See Lecture 19 and the demo example. We have provided you with two different subsets of the data sets. The first involves only four classes (plus the nothing class) here: 1 https://drive.google.com/drive/folders/1uRz6BFNPGBqzfyxJGwbBE9eiwq-bMzTK?usp=sharing This is the dataset where you should focus your effort. The second, larger data set involves all twenty classes here: https://drive.google.com/drive/folders/1Xgkf3QBdJAEz2vkBl4uy029–OPLoNqm?usp=sharing You can experiment with the larger dataset for a small amount of extra credit once you have good results on the smaller one. Your Network Your network should consist of the following: 1. A pre-trained backbone network from which one or more of the last layers have been removed. We provide you a starter HW6Model class which already has this pre-trained backbone using torchvision.models.resnet18. We have removed the last fully-connected layer, meaning the last layer of the backbone is a global average pooling layer. See the final comments section at the end of these instructions for more information on this. This backbone will be frozen, meaning you will not be training its weights. You can choose to unfreeze the backbone (fully or partially) and potentially achieve better results, but your model will take longer to train. 2. Add a classification layer that is fully-connected to the last remaining layer of the backbone network. If there are C classes, this classification layer should have C + 1 neurons, where class index 0 corresponds to the label nothing. 3. A regression layer that predicts 4C outputs, four for each non-zero class index. This regression layer should be fully connected to the last remaining layer from the backbone network. The input to the network is a 4-dimensional tensor of candidate regions extracted from the images. The outputs from the network are the associated activation values for each input candidate region plus the 4C values estimated for the upper left and lower right corners of the bounding rectangles for each candidate region for each class. When you calculate your bounding box regression loss, use the ground truth labels to determine which one of the C bounding boxes will contribute to the loss. You will use a combined loss function for the classification and regression layers when training, as was discussed during Lecture 21 and is in the hand-written notes. You might consider adding a hidden layer to the classification network or to the regression network, but don’t do so right away. You are welcome to use any optimization method you wish, or you can stick with stochastic gradient descent. Finally, note that even though the B candidate regions in a minibatch are passed through the network together, they will each be evaluated independently. This is very different from what happens with the test input after the network has been trained. The Validation Step Validation input will be in exactly the same form as the training input. Computing the validation loss is straight forward. You can also compute the accuracy. This will require that you count the number of correct decisions the network makes on the validation candidate regions. A decision is correct if the neuron with the highest activation corresponds to the ground truth label and, when the correct label is not 0, the IOU between the ground truth bounding box and the predicted bounding box is greater than 0.5. 2 You should evaluate your model on the validation data after every epoch. This will give you an idea of how well your model works on data outside your training set, and you should choose the model parameters which give you the best validation performance. One of these parameters is the number of epochs you train for, and if you find that training for a certain number of epochs results in better performance you should train for that long for your final model. After you train the best model you can based on the validation data, then proceed to the final testing steps. Input of Testing Data Input of the test data is managed by HW6DatasetTest, and is organized by images. The call to the __getitem__ method will return an image (a numpy array) together with all of its candidate image regions (each normalized to a 3x224x224 tensor), candidate bounding boxes, ground truth bounding boxes, and ground truth labels. Unlike during training, all bounding boxes are provided in the coordinate system of the original image. At test time, the candidate image regions will be fed into the network as a four-dimensional tensor in just the same way as a minibatch is during training. The post-processing to evaluate the results is very different however. Post-Processing — Only At Test Time The post-processing proceeds one image at a time. This is not done during training and validation because these steps stop with the evaluation of each prediction independently. At test time, however, and for what would be the actual use of the network, the predictions for each region must be combined and filtered down to form the actual ”detections” made by the system. As an initial step for doing this, the ground truth and predicted coordinates — at least for the detections that aren’t ”nothing” — must be converted back to image coordinates. To process the list of prediction regions for an image, the first step is to eliminate regions whose class is 0 (”nothing”); these are ”non-detections”. Then, order the remaining regions by decreasing activation. Finally, apply the non-maximum suppression algorithm described in Lecture 21, using an IOU of 0.5 as the threshold to decide if two regions overlap (in image coordinates). Importantly, two regions should only be compared for non-maximum suppression if they have the same class label. The result of the non-maximum suppression is the set of actual (”predicted”) detections. Evaluation For each test image, after this post-processing step the predicted detections must be compared to the ground truth detections to determine which are correct and to calculate the AP for each image and the mAP over all images. An intermediate goal here is to form the binary vector b for the computation of the AP for each image. For each predicted detection di in decreasing order of activation, compare di to the ground truth regions for the image and find all that have the same label as di . Among the ground truth regions having the same label as di , find the region having the highest IOU with di . Call this region g. If no ground truth regions have the same label as di or if IOU(di , g) < 0.5 then di is an incorrect detection, and a 0 should be appended to the binary decision vector b (see formulation of mean Average Precision from class.) Otherwise, di is a correct detection, and a 1 should be appended to b. Finally, if di is correct then g must be removed from the list of ground truth regions so that it does not get matched more than once. 3 Repeat the foregoing process for each of your detected regions for a given image, entering a 0 or 1 into b for each di . Consider up to at most 10 total regions. (If there are more than 10 ground truth results, only count C = 10 in the AP calculation.) From this, compute the average precision (AP) score at 10 detections for this image. If your network had only j < 10 regions dj , then you will make decisions about detection d0 through dj−1 and enter them into b, and fill the remaining entries of b with 0’s. Once you have b for a given image, you can calculate the average precision (AP) score for this image and its detections, just as we did in the Lecture 20 exercise. Ideally, at the end of the computation, each ground truth region will be matched to a detection, but those that aren’t are considered ”misses” or ”false negatives”. Finally, averaging the AP score over all input images gives your final mean average precision (mAP). What You Need to Do There are several coding parts to this assignment: 1. Forming your network based on ResNet18, including its loss functions. 2. Writing your training and testing functions. 3. Writing the code for the test evaluation. Starting with the last part, we have provided you with a template code called evaluation.py that outlines and tests the post-processing and evaluation code you need to write. Your completed version of this file will be uploaded separately to Submitty and automatically tested. We strongly suggest that you do this first. You will then use this code to help evaluate your test results. Output from the system should be different for training / validation and for testing. For training and validation the output should show: • A plot of the training and validation losses as a function of the epoch. • A confusion matrix on the classification decisions for training and validation at the end of training. • The average IOU between the predicted and correct bounding rectangles, only for the regions where the classifier is correct. During testing, detailed output from the network should be given for test images 1, 16, 28, 30, 39, 75, 78, 88, 93, and 146: • The original image with the bounding boxes of the final regions drawn on top. Each bounding box should have text giving the class label. Detections that are correct (IOU with a ground truth bounding box having the same class label is at least 0.5) should be shown in green. Detections that are incorrect should be shown in red. Ground truth detections that are missed should be shown in yellow. All of the necessary information is produced by the predictions_to_detections and evaluate functions that you need to write in evaluate.py. • The average precision for the image. The final output should show • The overall mean average precision across all test images. 4 • For each class, the percentage of detections of that class that are true positives and the percentage of ground truth detections that are found. Again, this information can be gathered using the results of the evaluate function that you need to write in evaluate.py. Submission In addition to the Submitty upload of evaluate.py as described above, please submit a zip file containing all your code and a pdf gathering all your results. The pdf file should include • An explanation of your neural network model and any variations you tested. • The training output described above • The testing output described above • A summary written assessment of your results. Final Comments • On AiMOS each training epoch should take just a few minutes, and faster if you use multiple workers. If your code is much slower than this there are significant inefficiencies. For instance during training you should only be using for loops over the epochs and dataloaders. While it may be tempting to compute loss with a for loop, it may result in the inability to train in a reasonable amount of time. • We strongly urge you to first test your ability to simply predict the correct region for each candidate region, without regard for the bounding boxes. If this is as far as you get you will still get at least 75% credit on the network part of this assignment. • If you get good results on the full data set you can earn up to 10 points extra credit for the assignment. • We found similar bounding box regression results between using the global average pooling layer of ResNet and instead concatenating all of its features. You can try both approaches, but the default method would be to use the output of the global average pooling layer (512-dim) as input to your bounding box regression. 5
The focus of this assignment is scene classification. In particular you will write SVM and neural network classifiers to determine the “background” class of a scene. The five possibilities we will consider are grass, wheat field, road, ocean and red carpet. Some of these are relatively easy, but others are hard. A zip file containing the images can be found at https://drive.google.com/file/d/1nrALsQXsU6KoPAMCz65H0S20N_mRvixZ/view?usp=sharing The images are divided into three sets: training, validation and test. Training images are used directly to optimize the learned weights of a model. The validation set is used to tune the parameters that determine the model that is trained. These parameters are things like (a) the number of layers of a neural network, (b) its learning rate, (c) when to stop training, and (d) the parameter controlling the tradeoff betwen the margin and the slack values for a linear SVM. Once the parameters of the model are set and the model is trained, the model is applied to the test data just once to see how well it ultimately works. Here’s a little more about the use of the validation data. Think of it as part of an outer loop of training where (a) the model parameters are set, (b) the model is trained, (c) the trained model is applied to the validation data to determine model accuracy, (d) if the resulting model is more accurate than any previously trained model, the model parameters and trained weights are saved, (e) new model parameters are set, and the process is repeated. Classifiers to Build In Problem 1 you will implement a descriptor and a linear SVM to classify the scene, while in Problem 2 you will implement two neural networks. 1. (60 points) In Lecture 14, which covered detection and SVMs, we focused on the “HoG” — histogram of oriented gradients — descriptor. After this method was published, many different types of descriptors were invented for many applications. For this problem you are going to implement a descriptor that combines location and color. It will include no gradient information. One long descriptor vector will be computed for each image, and then a series of SVM classifiers will be applied to the descriptor vector to make a decision. We strongly urge you to write two scripts to solve this problem, one to compute and save the descriptor for each image, and one to reload the descriptors, train the classifiers, and evaluate the performance. The descriptor is formed from an image by (a) computing a 3d color histogram in each of a series of overlapping subimages, (b) unraveling each histogram into a vector (1d NumPy array), and (c) concatenating the resulting vectors into one long vector. The key parameter here will be the value of t, the number of histogram bins in each of the three color dimensions. 1 (a) (b) Figure 1: Color histogram bins Fortunately, calculation of color histograms is straightforward. Here is example code to form an image of random values and compute its 3d color histogram: import numpy as np # Generate a small image of random values. M, N = 20, 24 img = np.floor(256 * np.random.random(M * N * 3)).astype(np.uint8) # Calculate and print the 3d histogram. t = 4 pixels = img.reshape(M*N, 3) hist, _ = np.histogramdd(pixels, (t, t, t)) print(’histogram shape:’, hist.shape) # should be t, t, t print(’histogram:’, hist) print(np.sum(hist)) # should sum to M*N Each histogram bin covers a range of u = 256/t gray levels in each color dimension. For example, hist[i, j, k] is the number of pixels in the random image whose red value is in the range [i ∗ u,(i+ 1) ∗ u) and whose green value is in the range [j ∗ u,(j + 1) ∗ u) and whose blue value is in the range [k ∗ u,(k + 1) ∗ u). Figure 1 illustrates the partitioning of the RGB color cube into 4 × 4 × 4 regions. As mentioned above, the image will be divided into overlapping subimage blocks and one color histogram size of t 3 will be computed in each block. These will be concatenated to form the final histogram. Let bw be the number of subimage blocks across the image and let bh be the number of blocks going down the image. This will produce bw · bh blocks overall, and a final descriptor vector of size t 3 · bw · bh. To compute the blocks in an image of H × W pixels, let ∆w = W bw + 1 and ∆h = H bh + 1 . 2 Original image with ∆w and ∆h spaced lines Blocks of pixels over which histograms are formed Figure 2: Image block tiling for bw = 4 and bh = 2. The blocks will each cover 2∆w columns and 2∆h rows of pixels. The image pixel positions of the upper left corners of the blocks will be (m∆w, n∆h), for m = 0, . . . , bw − 1, n = 0, . . . , bh − 1. Note that some pixels will contribute to only one histogram, some will contribute to two, and others will contribute four. (The same is true of the HoG descriptor.) Figure 2 illustrates the formation of blocks. After you have computed the descriptors, you will train a series of SVM classifiers, one for each class. To do so, you will be given a set of 4000 training images, {Ii}, with class labels yi ∈ (1, . . . , k) (for us, k = 5). To train classifier Cj , images with label yi = j are treated as yi = +1 in linear SVM training and images with label yi 6= j are treated as yi = −1. This will be repeated for each of the k classifiers. The descriptor is computed for each training image Ii to form the data vectors xi . Each resulting classifier Cj will have a weight vector wj and offset bj . The score for classifier j for a test image with descriptor vector x is dj = 1 kwjkw> j x + bj. (Recall that the 1/kwjk ensures that dj is a signed distance.) The classification for the test image I is the class associated with the value of j that gives the maximum dj score. This is used even if none of the dj scores are positive. For each SVM classifier, output simple statistics on the validation steps you use to set the model parameters. This should likely just be the multiplier that controls tradeoff between margin and slack values. Note that a different value of the multiplier is allowed for each of the five SVMs you train. Reasonable ranges for this parameter are [0.1, 10] After you complete training, you will test your classifiers with the set of 750 test images. Each will be run as described above and the label will be compared to the known correct label. You will output the percentage correct for each category, followed by a k × k confusion matrix. The confusion matrix entry at row r and column c shows the number of times when r 3 was the correct class label and c was the chosen class label. The confusion matrix would have all 0’s in the non-diagonal entries when the SVM classifier is operating at 100% accuracy. Some Details (a) It is very important that you use histogramdd or a similar function to compute the histogram. Do not iterate over the pixels in the image using for loops: you will have serious trouble handling the volume of images. You can iterate over the indices of the subimage blocks and then use Numpy calls as above to form the histograms. (b) As mentioned above, we suggest that you write one script to compute and save the descriptors for all training and test images, and then write a second script to train your SVM classifiers and generate your test output. We suggest that you use Python’s pickle module. (c) For your SVM implementation, we suggest using sklearn.svm.LinearSVC. To use the scikit-learn (sklearn) Python module, you will need to install the package. If you are using Anaconda, as suggested earlier in the semester, you can simply run conda install scikit-learn (d) The confusion matrix can be made using Matplotlib or sklearn.metrics.confusion_matrix (e) The computation of feature extraction might still be time consuming even with efficient Numpy use. We suggest that you develop and debug your program using a subset of the training image set before running your final version on the full training and test sets. (f) We suggest using at a minimum t = 4 and bw = bh = 4. This will give a descriptor of size 1,024 per image. (g) Finally, you can investigate the addition of gradient histograms to your descriptors. Doing so, with a careful analysis of the impacts, can earn you up to 5 points extra credit. Submit your code, your output showing your validation and training experiments, your final test results, and a write-up describing design choices, your validation steps and a summary discussion of when your classifier works well, when it works poorly, and why. Rubric: • [8 pts]: Clear and reasonably efficient code • [10 pts]: Descriptor formed, including choices of parameters • [8 pts]: Training of SVM including hyperparameter • [8 pts]: Output of confusion matrices • [10 pts]: Reasonably good results • [8 pts]: Clear overall explanation • [8 pts]: Good explanation of successes and failures 2. (60 points) We continue the background classification problem using neural networks. Specifically, you will use pytorch to implement two neural networks, one using only fully-connected layers, and the other using convolutional layers in addition to fully-connected layers. The networks will each start directly with the input images so you will not need to write or use 4 any code to do manual preprocessing or descriptor formation. Therefore, once you understand how to use pytorch the code you actually write will in fact be quite short. Your output should be similar in style to your output from Problem 1. Make sure that your write-up includes a discussion of the design of your networks and your validation steps. To help you get started I’ve provided a Jupyter notebook (on the Submitty site) that illustrates some of the main concepts of pytorch, starting with Tensors and Variables and proceeding to networks, loss functions, and optimizers. This also includes pointers to tutorials on pytorch.org. This notebook will be discussed in class on March 29 and April 1. If you already know TensorFlow, you’ll find pytorch quite straightforward. Two side notes: (a) PyTorch includes some excellent tools for uploading and transforming images into Tensor objects. For this assignment, you will not need to use these since you’ve already written code for image input for the previous problem that gathers images into numpy arrays and it is trivial to transform these objects into pytorch tensors. On the other hand, we will be discussing these tools — DataLoader and Dataset — in class and you are welcome to use them. (b) Because of the size of the images, you might find it quite computationally expensive and tedious to use a fully-connected network on the full-sized images. Therefore, you are welcome to resize the images before input to your first network. In fact, we strongly suggest that you start with significantly downsized images first and then see how far you can scale up! Rubric: Grading: 12 pts Quality of results with FC network 6 pts Output of FC confusion matrix. 6 pts Discussion of FC design choices 6 pts Overall discussion / presentation of FC results 12 pts Quality of results with ConvNet network 6 pts Output of ConvNet confusion matrix. 6 pts Discussion of ConvNet design choices 6 pts Overall discussion / presentation of ConvNet results Access to a GPU and to AIMOS For your experiments it will be helpful to have access to a GPU. Many of your computers have them, but there are other possibilities. The simplest one is to use Google Colaboratory, as demonstrated by the Jupyter notebook distributed for lecture. As announced on Submitty, there is also the possibility of using the RPI AIMOS super computer https://cci.rpi.edu/aimos.
Part 1 — 100 Points Overview Here is the basic problem statement, which is elaborated on below: Given a folder of N images as input, for each pair of images Ii and Ij you must 1. Decide if Ii and Ij show the same scene. 2. If the decision for 1 is “yes”, then decide if Ii and Ij can be aligned accurately enough to form a mosaic. 3. If the decisions for both 1 and 2 are “yes”, then create and output the mosaic that aligns image Ii and Ij accurately. It is possible that more than two images in a set of input images do show the same scene and may be combined into a mosaic. This is where the undergraduate and graduate versions of this assignment differ. In order to earn full credit, graduate students must produce a multi-image mosaic (in addition to the image pair mosaics). Undergraduates can earn a small amount of extra credit for multi-image mosaics. For graduate students this is the last 10 points on the assignment. For undergraduates, this is 5 points of extra credit. More on this below. Details Each image set you are given includes N ≥ 2 images, I1, . . . , IN . Each image should be read in and processed as grayscale! For each pair of images, Ii and Ij , with 1 ≤ i < j ≤ N, your code must do the following: 1. Extract the keypoints and descriptors in each image. We strongly urge you to use SIFT keypoints and descriptors, but you may use anything you wish. Output the number of keypoints in each image. 2. Match the keypoints and descriptors between the images. You may use cv2.BFMatcher or cv2.FlannBasedMatcher to do the matching. The decision about whether or not two keypoints match may be made using the ratio test for descriptors like SIFT, or using the 1 symmetric matching criteria for descriptors like ORB. At this point there will often be errors in your keypoint matches. Output: (a) The number of matches and the fraction of keypoints in each image that have a match (this should be significanctly less than 1). (b) A single image showing Ii and Ij side-by-side with lines drawn between matched keypoints (see cv2.drawMatches). Make each line a different color. 3. If the previous step produced too few matches overall or too small a percentage of matches, then stop attempting to match Ii and Ij . (You will need to decide the criteria and the threshold or thresholds.) Otherwise proceed to the next step of matching. Output a message giving the decision made at this step. 4. Using the matches produced by keypoint description matching, use RANSAC to estimate the fundamental matrix Fj,i that maps pixel locations from Ii onto lines in Ij . Please review the significance of the fundamental matrix in your class notes! You may use cv2.findFundamentalMat. Do this with the method setting cv2.FM RANSAC; you will have to explore the other parameter settings. After estimating Fj,i, you must determine which matches are “inliers” — consistent with the fundamental matrix. Specifically, if u˜i (from image Ii) and u˜j (from image Ij ) are the homogeneous coordinate locations of a matching keypoint, then aj,i = Fj,iu˜i will be the coordinates describing the line in image j along which u˜j should lie if it is a correct match. (This is the “epipolar line”.) While in theory u˜j would be exactly on the line, in practice it may be slightly off. On the other hand, most incorrect matches will typically have u˜j far from this line. Therefore you can determine which matches are inliers by measuring the distance between u˜j and the line and counting the number of keypoint matches that are within a small distance of the line. This is easy to do yourself as long as you determine the threshold and are careful to normalize aj,i properly so that you can measure distances correctly. However, the mask array that cv2.findFundamentalMat returns does this for you! You are welcome to use it. Output the following from this step: (a) The number and percentage of matches that are inliers. (b) An image showing Ii and Ij side-by-side with lines drawn between the keypoints that form inlier matches (see cv2.drawMatches). Make each line a different color. (c) An image showing the epipolar lines for the inlier matches drawn on image I2. Make each line a different color. This one may take a bit of work so we suggest saving it until after everything else is working. 5. If the previous step produced too few matches overall or too small a percentage of matches, then stop attempting to match Ii and Ij , and move on to the next image pair. (You will need to decide the criteria and the threshold or thresholds.) Otherwise proceed to the next step of matching. At this point your code will have made the decision that tells us whether or not Ii and Ij show the same scene. Output a message giving the decision made at this step. 6. Using the inlier matches from the fundamental matrix estimation step, estimate the parameters of the homography matrix Hj,i mapping Ii onto Ij . You may use cv2.findHomography 2 and RANSAC. Using a criteria for deciding which matches are “inliers”, count the number of inliers for the homography matching between images. Output the following from this step: (a) The number and percentage of inlier matches. (b) An image showing Ii and Ij side-by-side with lines drawn between the keypoints that form inlier matches (see cv2.drawMatches). Make each line a different color. 7. Based on the number of inlier matches from fundamental matrix estimation and from homography estimation, decide whether or not the images can be accurately aligned. The decision should be “yes” if most of the inlier matches from the fundamental matrix estimate are also kept as inliers to the homography estimate. Output your decision and the reason for your decision. 8. If the decision after the previous step is “yes” then build and output the mosaic of the two images. Try to come up with a relatively simple blending method that yields nice results instead of looking like one image is mapped and pasted on top of the other. Multi-Image Mosaics Here is a bit about forming multi-image mosaics, a problem you should leave until everything else is done. First, you need to remember which pairs of images can be aligned using a homography. Think of the images as nodes in a graph and the image pairs as edges. The images that will form the mosaic are the connected components. (If for some reason there is more than one connected component, pick the largest.) Second, you will need to pick an “anchor” image that will remain fixed while the other images are mapped onto it. This should in some sense be the “center” of the set of images in the connected component. Third, you need to compute the transformations that map the images onto this anchor image. This can get tricky quite quickly, so please do something very easy using only the results of matching pairs of images. In particular, if I0 is the anchor and Ii is successfully matched with I0, then use the transformation homography computed between them. If Ii does not have a homography with I0, but there is another image Ij that does, then “compose” the transformations: Ii onto Ij onto I0. This is not as hard as it sounds. In particular, if Hj,i is the estimated transformation matrix from Ii onto Ij and if H0,j is the estimated transformation matrix from Ij onto I0, then H0,i = H0,jHj,i is a good estimate of the transformation from Ii onto I0. In the data I provide there will not be any cases where you need to compose more than two transformations if you choose the anchor correctly. Note that commercial software that builds multi-image mosaics uses much more sophisticated methods to estimate H0,i. Command Line and Output Your program should run with the following very simple command line: python hw4_align.py in_dir out_dir where in_dir is the path to the directory containing input images. We will run some of your submissions to test them. The code should write all images to out_dir, which should be a different directory from in_dir. This will avoid clutter across multiple runs. Your code will need to output (via print statements) a significant amount of text as described above. For each mosaic you create, make the file name be the composition of the names of the input file prefixes, in sorted order. For example, if the images are bar.jpg, cheese.jpg and foo.jpg, then the mosaic of the first two will 3 be bar_foo.jpg and the mosaic of all three will be bar_cheese_foo.jpg. Use the extension from the first image (all images will be jpg or JPG). Note that for image pairs that do form mosaics, there will be four output images —- the images that result from steps 2, 4, 6 and 8. For pairs that do not form mosaics, there will be fewer output images, depending on which decision (steps 3, 5 or 7) stopped the computation. There will always be an output image from step 2. Write Up and Code Please generate a write-up describing your algorithms, your decision criteria, your blending algorithms, and your overall results. Evaluate both strengths and weaknesses, using images — perhaps including some we did not provide — to illustrate. One suggestion is to make a table summarizing the results on all the image pairs, including the matching results, the number and percentage of inliers to F and to H (if they were estimated), and the final decision your algorithm made. This will take some time to generate but it is the type of analysis you will need to learn how to make to evaluate computer vision and machine learning algorithms. You don’t have to make this beautiful, just make it clear and easy to follow. The actual text should be no more than a page or so, single-spaced, but the document should be longer because of the results table and the illustrating images. Finally, make sure your code is clean, reasonably efficient, documented, and well-structured. Complete Submission Your final submission for Part 1 will be a single zip find that includes the following: 1. Your .py file 2. The text output files from running your code on each of the image sets provided, plus other image sets you’d like to show. One additional suggest is to run your algorithm on two images taken from different sets. 3. As many image results as you need to illustrate your successes (and failures), both in forming mosaics and in deciding not to do so! 4. Your final write-up. The zip file will be limited to 60MB. This means it is unlikely that you can include all image results. Part 2 — Comparing Descriptor Matching Methods — 25 Points SIFT keypoint descriptor matching is based on the ratio test. ORB and other matching methods use symmetric matching. This could be used as well for SIFT, but should it? In this problem you will write a Python script to try to analyze this question First, here is the definition of symmetric matching. Let ui , i ∈ 1, . . . Nu be the descriptor vectors for the keypoints from image Iu and let vj , j ∈ 1, . . . Nv be the descriptor vectors from image Iv. Then a descriptor ui ∗ from Iu and a descriptor vj ∗ from Iv are matched if j ∗ = argmin j∈1,…Nv D(ui ∗ , vj ). 4 and i ∗ = argmin i∈1,…Nu D(ui , vj ∗ ). where D(·, ·) measures the distance between two descriptors — Euclidean distance for SIFT and Hamming distance for ORB. More simply put, ui ∗ and vj ∗ are matched if each is the other’s closest descriptor. Your job is to implement symmetric matching for SIFT descriptors and compare to ratio test matching, also for SIFT descriptors. You should analyze both (1) image pairs that show the same scene and therefor should match, and (2) and image pairs that do not show the same scene and therefore should not match. Note that in the latter case, there are no truly correct matches. Also note that the information about whether or not the two images should be matched is provided to your code in the command-line (based on what you learn from the results of Part 1). For each pair of images, I1 and I2, let their keypoints be the sets K1 and K2. Compute the matches between the keypoint sets as in Part 1 using the ratio test and then using the symmetric matching test. Call the resulting sets of matches MR and MS, respectively. Your first set of outputs should be the number and percentage of keypoints that matched using the ratio test and using the symmetric matching test. To be specific these are | MR | and | Ms | for the counts, and | MR | min(| K1 |, | K2 |) and | MS | min(| K1 |, | K2 |) for the percentages. The second set of outputs should only be generated if the two images should match. In this case, use the match set MR to estimate the fundamental matrix F, as in Part 1, Step 4. Then identify the inlier matches from MR and MS, calling these sets M0 R and M0 S . The same F should be used in both cases, so you’ll need to implement the method to count inliers discussed at the end of Part 1, Step 4, at least for MS. The output should be the size of the inlier sets | M0 R | and | M0 s | and the percentage of matches that are inliers | M0 R | | MR | and | M0 S | | MS | . Based on results from several pairs of images, make a recommendation about whether the ratio test or symmetric matching is better and why. Command-Line and Output Here is a suggested command-line python compare.py img1 img2 should_match where img1 and img are the image file names, and should_match is a boolean flag (0 or 1) indicating whether or not the images should match. Use images we provided for Part 1 and any other pictures you’d like to try.What to Submit Submit just two files zipped together, compare.py and a pdf file writeup summarizing your results and recommendations. The writeup should be less than a page of text plus any results or pictures you’d like to include to illustrate. Try to convince us that your recommendation is correct.
Programming Problems 1. (50 points) Ordinarily, image resize functions, like the one in OpenCV, treat each pixel equally — everything gets reduced or increased by the same amount. In 2007, Avidan and Shamir published a paper called “Seam Carving for Content-Aware Image Resizing” in SIGGRAPH that does the resizing along contours in an image — a “seam” — where there is not a lot of image content. The technique they described was the starting point for what is now a standard feature in image manipulation software such as Adobe Photoshop. Here is an example of an image with a vertical seam drawn on it in red. A vertical seam in an image contains one pixel per row, and the pixels on the seam are 8- connected between rows, meaning that pixel locations in adjacent rows in a seam differ by at 1 most one column. Formally a vertical seam in an image with M rows and N columns can be described as a set of pixels sr = {(i, c(i))} M−1 i=0 s.t. ∀i, |c(i) − c(i − 1)| ≤ 1. (1) In reading this, think of i as the row, and c(i) is the chosen column in each row. Similarly, a horizontal seam in an image contains one row per column and is defined as a set of pixels sc = {(r(j), j)} N−1 j=0 s.t. ∀j, |r(j) − r(j − 1)| ≤ 1. (2) Here, think of j as the column and r(j) as the row for that column. Once a seam is selected — suppose for now that it is a vertical seam — the pixels on the seam are removed from the image, and the pixels that are to the right of the seam are shifted to the left by one. This will create a new image that has M rows and N − 1 columns. (There are also ways to use this to add pixels to images, but we will not consider this here!) Here is an example after enough vertical seams have been removed to make the image square. The major question is how to select a seam to remove from an image. This should be the seam that has the least “energy”. Energy is defined in our case as the sum of the derivative magnitudes (not the gradient magnitude!) at each pixel: e[i, j] =| ∂I ∂x(i, j) | + | ∂I ∂y (i, j) | . for i ∈ 1 . . . M − 2, j ∈ 1 . . . N − 2. (Use the OpenCV Sobel function and no Gaussian smoothing to compute the partial derivatives.) The minimum vertical seam is defined as the one that minimizes M X−1 i=0 e[i, c(i)]/M over all possible seams c(·). Finding this seam appears to be a hard task because there is an exponential number of potential seams. Fortunately, our old friend (from CSCI 2300) dynamic programming comes to the rescue, allowing us to find the best seam in linear time in the number of pixels. To realize this, we need to recursively compute a seam cost function W[i, j] at each pixel that represents the 2 minimum cost seam that runs through that pixel. Recursively this is defined as W[0, j] = e[0, j] ∀j W[i, j] = e[i, j] + min W[i − 1, j − 1], W[i − 1, j], W[i − 1, j + 1] ∀i > 0, ∀j Even if you don’t know dynamic programming, computing W[i, j] is pretty straightforward (except for a few NumPy tricks — see below). Once you have matrix W, you must trace back through it to find the actual seam. This is also defined recursively. The seam pixels, as defined by the function c(·) from above, are c(N − 1) = argmin 1≤j≤N−1 W[N − 1, j] c(i) = argmin j∈{c(i+1)−1,c(i+1),c(i+1)+1} W[i, j] for i ∈ N − 2 downto 0 In other words, in the last row, the column with the minimum weight (cost) is the end point of the seam. From this end point we trace back up the image, one row at a time, and at each row we choose from the three possible columns that are offset by -1, 0 or +1 from the just-established seam column in the next row. A few quick notes on this. • You need to be careful not to allow the seam to reach the leftmost or rightmost column. The easiest way to do this is to introduce special case handling of columns 0 and N − 1 in each row, assigning an absurdly large weight. • The trickiest part of this from the NumPy perspective is handling the computation of the minimum during the calculation of W. While you clearly must explicitly iterate over the rows (when finding the vertical seam), I don’t want you iterating over the columns. Instead, use slicing in each row to create a view of the row that is shifted by +1, -1 or 0 and then take the minimum. For example, here is code that determines at each location in an array, if the value at that is greater than the value at either its left or right neighbors. import numpy as np a = np.random.randint( 0,100, 20 ) print(a) is_max = np.zeros_like(a, dtype=np.bool) left = a[ :-2] right = a[ 2: ] center = a[ 1:-1 ] is_max[ 1:-1 ] = (center > right) * (center > left) is_max[0] = a[0] > a[1] is_max[-1] = a[-1] > a[-2] print(“Indices of local maxima in a:”, np.where(is_max)[0]) ’’’ Example output: [93 61 57 56 49 40 51 85 5 13 28 89 31 56 11 10 60 93 26 86]Indices of local maxima in a: [ 0 7 11 13 17 19] ’’’ • Recompute the energy matrix e after each seam is removed. Don’t worry about the fact that the energy of most pixels will not change. • The seam should be removed from the color image, but the energy is computed on a gray scale image. This means you will have to convert from color to gray scale before each iteration energy matrix computation and seam removal. • Convert the image to float immediately after reading it (and before any derivative computation). This will ensure the greatest consistency with our results. In particular, if fname stores the name of the image file, use img = cv2.imread(fname).astype(np.float32) Command-line and output: Your program should take an image as input and remove enough rows or columns to make the image square. The command-line should be python p1_seam_carve.py img For 0th, 1st and last seam, please print the index of the seam (0, 1, …), whether the seam is vertical or horizontal, the starting, middle and end pixel locations on the seam (e.g. if there are 11 pixels, output pixels 0, 5 and 10), and the average energy of the seam (accurate to two decimal places). Finally, output the original image with the first seam drawn on top of image in red and output the final, resized image. If foo.png is the input image, the output images should be foo_seam.png and foo_final.png. Finally, you may not be able to reproduce the exact output of my code. Do not worry about this too much as long as your energies and seams are close. Especially important is that the final square image looks close. 2. (40 points) In class we started to implement an edge detector in the Jupyter notebook edge demo.ipynb, including Gaussian smoothing and the derivative and gradient computations. The code is posted on Submitty. In this problem, you will implement the non-maximum suppression step and then a thresholding step, one that is simpler than the thresholding method we discussed in class. Here are the details: • For non-maximum suppression, a pixel should be marked as a maximum if its gradient magnitude is greater than or equal to those of its two neighbors along the gradient direction, one “ahead” of it, and one “behind” it. (Note that by saying “greater than or equal”, edges that have ties will be two (or more) pixels wide — not the right solution in general, but good enough for now.) As examples, if the gradient direction at pixel location (x, y) is π/5 radians (36◦ degrees) then the ahead neighbor is at pixel (x+1, y+1) and the behind neighbor is at pixel (x − 1, y − 1), whereas if the gradient direction is 9π/10 (162◦ ) then the ahead neighbor is at pixel (x − 1, y) and the behind neighbor is at pixel (x + 1, y). • For thresholding, start from the pixel locations that remain as possible edges after nonmaximum suppression and eliminate those having a gradient magnitude of lower than 1.0. Then, for the remaining pixels, compute the mean, µ, and the standard-deviation, s, of the gradient magnitudes. The threshold will be the minimum of µ+ 0.5s and 30/σ, 4 the former value because in most images, most edges are noise, and the latter value to accommodate clean images with no noise. Dividing by σ is because Gaussian smoothing reduces the gradient magnitude by a factor of σ. The command-line should be python p2_edge.py sigma in_img where • sigma is the value of σ used in Gaussian smoothing, and • in_img is the input image. (I have posted an example on line with sigma = 2 and in img = disk.png) The text output from the program will be: • The number of pixels that remain as possible edges after the non-maximum suppression step. • The number of pixels that remain as possible edges after the gradient threshold of 1.0 has been applied. • µ, s and the threshold, each on a separate line and accurate to 2 decimal places. • The number of pixels that remain after the thresholding step. Three output images will be generated, with file names created by adding a four character string to the file name prefix of the input image. Examples below assume that the image is named foo.png. Here are the three images: • The gradient directions of all pixels in the image encoded to the following five colors: red (255, 0, 0) for pixels whose gradient direction is primarily east/west; green (0, 255, 0) for pixels whose gradient direction is primarily northwest/southeast; blue (0, 0, 255) for pixels whose gradient direction is primarily north-south; white (255, 255, 255) for pixels whose gradient direction is primarily northeast/southwest; and black (0, 0, 0) for any pixel on the image border (first or last row or column) and for any pixel, regardless of gradient direction, whose gradient magnitudent is below 1.0. The file name should be foo_dir.png. • The gradient magnitude before non-maximum suppression and before thresholding, with the maximum gradient mapping to the intensity 255. The file name should be foo_grd.png. • The gradient magnitude after non-maximum suppression and after thresholding, with the maximum gradient mapping to the intensity 255. The file name should be foo_thr.png. Notes: • Be sure that your image is of type float32 before Gaussian smoothing. • At first it will seem a bit challenging — or at least tedious — to convert the initial gradient direction, which is measured in radians in the range [−π, π], into a decision as to whether the gradient magnitude is primarily west/east, northwest/southeast, north/south, or northeast/southwest. For example, the ranges from [−π, −7π/8], [−π/8, π/8], and [7π/8, π] are all east-west. You could write an extended conditional to assign 5 these directions, or you could write one or two expressions, using Numpy’s capabilty for floating-point modular arithmetic, to simultaneously assign 0 to locations that are west/east, 1 to locations that are northwest/southeast, etc. Think about it! • This problem is a bit harder than previous problems to solve without writing Python for loops that range over the pixels, but good solutions do exist. Full credit will be given for a solution that does not require for loops, while up to 36 of 40 for students in 4270 will be given for a solution that requires for loops. (For students in 6270 this will be 32 out of 40.) In other words, we’ve provided mild incentive for you to figure out how to work solely within Python (and numpy) without for loops. Examples that have been given in class and even worked through on homework can help. You’ll have to consider each direction (somewhat) separately. • A final word of wisdom: build and test each component thoroughly before using it in a larger system – I know it’s hard to force yourself to move this slowly, but I promise it will make this (and future) problems easier! 3. 20 points Object detection and change detection are two of the most important problems in computer vision. We will discuss object detection at several points throughout the semester. Here in particular we are going to adapt the simple change detection method from the Lecture 7 Jupyter notebook to detect the presence or absence of a bird in a picture. Thanks to Olivia Lundelius for the pictures and suggestion. Your python program will be given two images taken from the same camera, in the same pose, at nearly the same time. The first image will definitely NOT show a bird. The second image may or may not show a bird. Your output should be in two parts. The first is a single line of text containing the word YES (meaning that there is a bird in the second image), or the word NO (meaning that there is a bird in the second image). The second is an image showing the change regions of the images that indicate whether or not there is a bird present. Change regions indicating the presence of a bird should be shown in color (however you choose). Change regions that do not indicate the presence of a bird should be shown as gray level (intensity vector (100, 100, 100)). So, if there is not a bird, all change regions (there will almost always be some) should be shown as gray. Please use as much or as little of the change detection Jupyter notebook as you wish. Adjust the parameters and add decision criteria. You are welcome to add whatever the decision criteria you’d like, including location, size, etc. Written Problem 1. (15 points) Evaluate the quality of the results of seam carving on several images. In particular, find images for which the results are good, and find images for which the results are poor. Show these images before and after applying your code. What causes poor results? Please explain. 2. (10 points) Regarding your bird detector, please answer the following questions, briefly and precisely: (a) What is your bird detection decision criteria? Include any modification to the Jupyter notebook methods and parameters. 6 (b) When will your algorithm succeed, when will it fail, and why? Note that your algorithm will fail at some point, so don’t be shy in your answer to this. Provide examples to justify this.
Written Problems 1. (15 points) Give an algebraic proof that a straight line in the world projects onto a straight line in the image. In particular (a) Write the parametric equation of a line in three-space. (b) Use the simplest form of the perspective projection camera from the start of the Lecture 5 notes to project points on the line into the image coordinate system. This will give you equations for the pixel locations x and y in terms of t. Note that t will be in the denominator. (c) Combine the two equations to remove t and rearrange the result to show that it is in fact a line. You should get the implicit form of the line. (d) Finally, under what circumstances is the line a point? Show this algebraically. 2. (15 points) Let A be an m × n matrix of real values, with m ≥ n. What is the relationship between the SVD A and the eigen decomposition of A>A? Justify your answer. You will need to know that the eigenvalues of a matrix are unique up to a reordering of eigenvalues and vectors, so you may assume they are provided in any order you wish. By construction, the singular values are non-increasing. (The eigenvectors / singular vectors are unique up to a reordering only if the eigenvalues / singular values are unique.) Justify your answer algebraically. 1 3. (10 points Grad Only) Problem 1 includes an important over-simplification: the perspective projection of a line does not extend infinitely in both directions. Instead, the projection of the line terminates at what is referred to as the “vanishing point”, which may or may not appear within the bounds of the image. Using the parametric form of a line in three-space and the simple perspective projection model, find the equation of the vanishing point of the line. Then, show why this point is also the intersection of all lines that are parallel to the original line. Under what conditions is this point non-finite? Give a geometric interpretation of these conditions. Programming Problems 1. (30 points) This problem is about constructing a camera matrix and applying it to project points onto an image plane. The command line of your program should simply be python p1_camera.py params.txt points.txt Here params.txt contains parameters that can be used to form the 3 × 4 camera matrix. Specifically, the following ten floating point values will appear in the file on three lines: rx ry rz tx ty tz f d ic jc Here’s what these mean: Relative to the world coordinate system, the camera is rotated first by rz degrees about the z-axis, then ry degrees about the y-axis, then rx degrees about the x-axis. Then it is translated by vector (tx,ty,tz) millimeters. The focal length of the lens is f millimeters, and the pixels are square with d microns on each side. The image is 4000×6000 (rows and columns) and the optical axis pierces the image plane in row ic, column jc. Use this to form the camera matrix M. In doing so, please explicitly form the three rotations matrices (see Lecture 05 notes) and compose them. (Note: the rotation about the z axis will be first and therefore it is the right-most of the rotation matrices.) Overall on this problem, be very, very careful about the meaning of each parameter and its units. The posted example results were obtained by converting length measurements to millimeters. Please output the 12 terms in the resulting matrix M, with one row per line. All values should be accurate to 2 decimal places. I have provided two examples and in my example I’ve also printed R and K, but you should not do this in your final submission. Next, apply the camera matrix M to determine the image positions of the points in points.txt. Each line of this file contains three floating points numbers giving the x, y and z values of a point in the world coordinate system. Compute the image locations of the points and determine if they are inside or outside the image coordinate system. Output six numerical values on each line: the index of the point (the first point has index 0), the x, y and z values that you input for the point, and the row, column values. Also output on each line, the decision about whether the point is inside or outside. (Anything with row value in the interval [0, 4000] and column value in the interval [0, 6000] is considered inside.) For example, you might have 0: 45.1 67.1 89.1 => 3001.1 239.1 inside 1: -90.1 291.1 89.1 => -745.7 898.5 outside 2 All floating values should be accurate to just one decimal place. One thing this problem does not address yet is whether the points are in front of or behind the camera, and therefore are or not truly visible. Addressing this requires finding the center of the camera and the direction of the optical axis of the camera. Any point is considered visible if it is in front of the plane defined by the center of projection (the center of the hypothetical lens) and the axis direction. As an example to illustrate, in our simple model that we started with, the center of the camera is at (0, 0, 0) and the direction of the optical axis is the positive z-axis (direction vector (0, 0, 1)) so any point with z > 0 is visible. (Note: in this case, a point is considered “visible” even if it is not “inside” the image coordinate system.) To test that you have solved this, as a final step, print the indices of the points that are and are not visible, with one line of output for each. For example, you might output visible: 0 3 5 6 hidden: 1 2 4 If there are no visible values (or no hidden values), the output should be empty after the word visible:. This will be at the end of your output. To summarize your required output: (a) Matrix M (one row per line, accurate to one decimal place) (b) Index and (x, y, z) position of input point, followed by transformed (r, c) location and whether it’s inside the 4, 000 × 6, 000 frame (c) Visible point indices (sorted ascending) (d) Hidden point indices (sorted ascending) 2. (25 points) Implement the RANSAC algorithm for fitting a line to a set of points. We will start our discussion with the command line. python p2_ransac.py points.txt samples tau [seed] where points.txt is a text file containing the x, y coordinates of one points per line, samples is a positive integer indicating the number of random pairs of two points to generate, tau is the bound on the distance from a point to a line for a point, and seed is an optional parameter giving the seed to the random number generator. After reading the input, if the seed is provided, your first call to a NumPy function must be np.random.seed(seed) otherwise, do not call the seed function. Doing this will allow us to create consistent output. For each of samples iterations of your outer loop you must make the call sample = np.random.randint(0, N, 2) to generate two random indices into the points. If the two indices are equal, skip the rest of the loop iteration (it still counts as one of the samples though). Otherwise, generate the line and run the rest of the inner loop of RANSAC. Each time you get a new best line estimate according to the RANSAC criteria, print out the following values, one per line, with a blank line afterward: 3 • the sample number (from 0 up to but not including samples) • the indices into the point array (in the order provided by randint), • the values of a, b, c for the line (ensure c ≤ 0 and a 2 + b 2 = 1) accurate to three decimal places, and • the number of inliers. At the end, output a few final statistics on the best fitting line, in particular output the average distances of the inliers and the outliers from the line. Keep all of your output floating point values accurate to three decimal places. In the interest of saving you some work, I’ve not asked you to generate any plots for this assignment, but it would not hurt for you to do so just to show yourself that things are working ok. For similar reasons, no least-squares fit is required at the end — no need to repeat the exercise from Lecture 04. Here is an example execution result python p2_ransac.py test0_in.txt 25 2.5 999 > p4_test1_out.txt and output Sample 0: indices (0,28) line (-0.983,0.184,-26.286) inliers 13 Sample 3: indices (27,25) line (0.426,0.905,-4.913) inliers 19 Sample 10: indices (23,4) line (0.545,0.838,-0.944) inliers 21 avg inlier dist 0.739 avg outlier dist 8.920 3. (15 points — Students in 4270 ONLY) You are given a series of images (all in one folder) taken of the same scene, and your problem is to simply determine which image is focused the best. Since defocus blurring is similar to Gaussian smoothing and we know that Gaussian smoothing reduces the magnitude of the image’s intensity gradients, our approach is simply to find the image that has the largest average squared gradient magnitude across all images. This is value is closely related to what is referred to as the “energy” of the image. More specifically, this is E(I) = 1 MN M X−1 i=0 N X−1 j=0 ∂I ∂x(i, j) 2 + ∂I ∂y (i, j) 2 .Note that using the squared gradient magnitude is important here. In order to ensure consistency across our implementations, use the two OpenCV Sobel kernels to compute the x and y derivatives and then combine into the square gradient magnitude as in the above equation. Specifically, the calls to the Sobel function should be im_dx = cv2.Sobel(im, cv2.CV_32F, 1, 0) im_dy = cv2.Sobel(im, cv2.CV_32F, 0, 1) The command-line of your program will be python p3_best_focus.py image_dir where image_dir is the path to the directory that contains the images to test. Assume all images are JPEGs with the extension .jpg (in any combination of capital and small letters). Sort the image names using the python list sort function. Read the images as grayscale using the built-in cv2.imread. Then output for each image the average squared gradient magnitude across all pixels. (On each line output just the name of the image and the average squared gradient magnitude, accurate to just one decimal place.) Finally output the name of the best-focused image. Here is an example: python p3_best_focus.py evergreen produces the output DSC_1696.JPG: 283.9 DSC_1697.JPG: 312.7 DSC_1698.JPG: 602.4 DSC_1699.JPG: 2137.2 DSC_1700.JPG: 10224.8 DSC_1701.JPG: 18987.1 Image DSC_1701.JPG is best focused. 4. (30 points — 6270 ONLY) You are given a series of images (all the images in one folder) taken of the same scene but with different objects in focus in different images. In some images, the foregound objects are in focus, in others objects in the middle are in focus, and in still others, objects far away are in focus. Here are three examples from one such series: 5 Your goal in this problem is to use these images to create a single composite image where everything is as well focused as possible. The key idea is to note that the blurring you see in defocused image region is similar to Gaussian smoothing, and we know that Gaussian smoothing reduces the magnitude of the intensity gradients. Therefore, if we look at the weighted average of the intensity gradients in the neighborhood of a pixel, we can get a measure of image energy of the pixel. Higher energy implies better focus. The equation for this energy is E(I; x, y) = Px+k u=x−k Py+k v=y−k w(u − x, v − y) ∂I ∂x (u, v) 2 + ∂I ∂y (u, v) 2 Px+k u=x−k Py+k v=y−k w(u − x, v − y) , where w(·, ·) is a Gaussian weight function, whose standard deviation σ will be a commandline parameter. Use k = b2.5σc to define the bounds on the Gaussian. More specifically, use cv2.GaussianBlur, let h = b2.5σc and define ksize = (2*h+1, 2*h+1). See our class examples from the lecture on image processing. Note that using the squared gradient magnitude, as written above, is important here. In order to ensure consistency with our implementation, use the two OpenCV Sobel kernels to compute the x and y derivatives and then combine into the square gradient magnitude as in the above equation. Specifically, the calls to the Sobel kernel should be im_dx = cv2.Sobel(im, cv2.CV_32F, 1, 0) im_dy = cv2.Sobel(im, cv2.CV_32F, 0, 1) Then, after computing E(I0; x, y), . . . E(In−1; x, y) across the n images in a sequence, there are a number of choices to combine the images into a final image. Please use I ∗ (x, y) = Pn−1 i=0 E(Ii ; x, y) p Ii(x, y) Pn−1 i=0 E(Ii ; x, y) p , where p > 0 is another command-line parameter. In other words, E(Ii ; x, y) p becomes the weight for a weighted averaging. (As p → ∞ this goes toward the maximum, and as p → 0 this becomes a simple average, with the energy measure having no effect (something we do not want).) Note that a separate averaging is done at each pixel. While this seems like a lot, OpenCV and NumPy tools make it relatively straight forward. You will have to be careful of image boundaries (see lecture discussion of boundary conditions). Also, this will have to work on color images, in the sense that the gradients are computed on gray scale images, but the final image still needs to be in color. The command-line of your program will be python p4_composite image_dir out_img sigma p 6 where image_dir is the path to the directory that contains the images to test, out_img is the output image name, sigma is the value of σ (assume σ > 0) for the weighting, and p > 0 is the exponent on the energy function E in the formation of the final image. Now for the output. The most important, of course, is the final image. Make sure you write this to the folder where the program is run (i.e. if you switch folders to get the images, switch back before output). In addition, for pixels (M//3, N//3), (M//3, 2N//3), (2M//3, N//) and (2M//3, 2N//3) where (M, N) is the shape of the image array, output the following: • The value of E for each image • The final value of I ∗ at that pixel These should be accurate to 1 decimal place. Example results are posted with output from the command-line python p4_sharp_focus.py branches test02_p4_branches_combined.jpg 5.0 2 Finally, submit a pdf with a separate discussion of the results of your program, what works well, what works poorly, and why this might be. Illustrate with examples where you can. This part is worth 8 points toward your grade on this homework.
1. (20 points) Write a script that takes a single image and creates a checkerboard pattern from it. The command-line will look like python p1_checkerboard.py im out_im m n Input image im should be cropped to make it square and resized to make it m × m. Next, it should be formed into a 2×2 grid of m×m images. The 0,0 entry for the grid should show the downsized image, and the 1,1, entry for the grid should show the image upside down. Then the 0,1 entry should show the 0,0 image with the colors of the image inverted so that each color intensity value p is replaced by 255−p, and the 1,0 entry should show the 1,1 entry with the colors inverted. Finally, replicate the 2×2 grid of images to make it 2n × 2n, generating a final image having 2nm × 2nm pixels. Save the result to out_im. Use NumPy functions concatenate and tile to create the final image. See discussion of np.tile below. Here is an example command line python p1_checkerboard.py mountain3.jpg p1_mountain3_checker_out.jpg 120 4 and desired output Image mountain3.jpg cropped at (0, 420) and (1079, 1499) Resized from (1080, 1080, 3) to (120, 120, 3) The checkerboard with dimensions 960 X 960 was output to p1_mountain3_checker_out.jpg 2. (20 points) Do you recognize Abraham Lincoln in this picture? 2 If you don’t you might be able to if you squint or look from far away. Try it now. In this problem you will write a script to generate such a blocky, scaled-down image. The idea is to form the block image from the input image, which you will read as a grayscale: Do this in two steps: (a) Compute a “downsized image” where each pixel represents the average intensity across a region of the input image. (b) Generate the larger block image by expanding each pixel in the downsized image to a block of pixels having the same intensity. (c) Generate a binary image version of the downsized image and make a block version of it as well. The input to your script will be an image and three integers: python p2_block.py img m n b The values m and n are the number of rows and columns, respectively, in the downsized image, while b is the size of the blocks that replace each downsized pixel. The resulting image should have mb rows and nb columns. When creating the downsized image, start by generating two scale factors, sm and sn. If the input image has M rows and N columns, then we have sm = M/m and sn = N/n. (Notice that these will be float values.) The pixel value at each location (i, j) of the downsized image will be the (float) average intensity of the region from the original gray scale image whose row values include round(i ∗ sm) up to (but not including) round((i + 1) ∗ sm) and whose column values include round(j ∗ sn) up to (but not including) round((j + 1) ∗ sn). You will then create a second downsized image that will be a binary version of the first downsized image. The threshold for the image will be decided such that half the pixels are 3 0’s and half the pixels are 255. More precisely, any pixel whose value (in the downsized image) is greater than or equal to the median value (NumPy has a median function) should be 255 and anything else should be 0. Note that this means the averages should be kept as floating point values before before forming the binary image. Once you have created both of these downsized images, you can easily upsample them to create the block images. Before doing this, convert the average gray scale image to integer by rounding. The gray scale block image should be output to a file whose name is the same as the input file, but with _g appended to the name just before the file extension. The binary block image should be output to a file whose name is the same as the input file, but with _b appended to the name just before the file extension. Text output should include the following: • The size of the downsized images. • The size of the block images. • The average output intensity (as float values accurate to two decimals) at the following downsized pixel locations: – (m // 4, n // 4) – (m // 4, 3n // 4) – (3m // 4, n // 4) – (3m // 4, 3n // 4) • The threshold for the binary image output, accurate to two decimals. • The names of the output images. Here is an example. python p2_block.py lincoln1.jpg 25 18 15 which produces the output Downsized images are (25, 18) Block images are (375, 270) Average intensity at (6, 4) is 59.21 Average intensity at (6, 13) is 55.46 Average intensity at (18, 4) is 158.30 Average intensity at (18, 13) is 35.33 Binary threshold: 134.68 Wrote image lincoln1_g.jpg Wrote image lincoln1_b.jpg Important Notes: (a) To be sure you are consistent with our output, convert the input image to grayscale as you read it using cv2.imread, i.e. im = cv2.imread(sys.argv[1], cv2.IMREAD_GRAYSCALE) 4 (b) You are only allowed to use for loops over the pixel indices of the downsized images (i.e. the 25×18 pixel image in the above example). In addition, avoid using for loops when converting to a binary image. (c) Be careful with the types of the values stored in your image arrays. Internal computations should use np.float32 or np.float64 whereas output images should use np.uint8. 3. (20 points) Image manipulation software tools include methods of introducing shading in images, for example, darkening from the left or right, top or bottom, or even from the center. Examples are shown in the following figure, where the image darkens as we look from left to right in the first example and the image darkens as we look from the center to the sides or corners of the image in the second example. The problem here is to take an input image I, create a shaded image Is, and output the input image and its shaded version (I and Is) side-by-side in a single image file. Supposing I has M rows and N columns, the central issue is to form an M × N array of multipliers with values in the range [0, 1] and multiply this by each channel of I. For example, values scaling from 0 in column 0 to 1 in column N − 1, with i/(N − 1) in column i, produce an image that is dark on the left and bright on the right (opposite the first example above). This M × N array is called an alpha mask, or mask. Write a Python program that accomplishes this. The command-line should run as python p3_shade.py in_img out_img dir where dir can take on one of five values, left, top, right, bottom, center. (If dir is not one of these values, do nothing. We will not test this case.) The value of dir indicates the side or corner of the image where the shading starts. In all cases the value of the multiplier should be proportional to 1 −d(r, c), where d(r, c) is the distance from pixel (r, c) to the start of the shading, normalized so that the maximum distance is 1. For example, if the image is 7 × 5 and dir == ’right’ then the multipliers should be [[ 0. , 0.25, 0.5 , 0.75, 1. ], [ 0. , 0.25, 0.5 , 0.75, 1. ], [ 0. , 0.25, 0.5 , 0.75, 1. ], 5 [ 0. , 0.25, 0.5 , 0.75, 1. ], [ 0. , 0.25, 0.5 , 0.75, 1. ], [ 0. , 0.25, 0.5 , 0.75, 1. ], [ 0. , 0.25, 0.5 , 0.75, 1. ]]) whereas if the image is 5 × 7 and dir == ’center’ then the multipliers should be [[0. 0.216 0.38 0.445 0.38 0.216 0. ] [0.123 0.38 0.608 0.723 0.608 0.38 0.123] [0.168 0.445 0.723 1. 0.723 0.445 0.168] [0.123 0.38 0.608 0.723 0.608 0.38 0.123] [0. 0.216 0.38 0.445 0.38 0.216 0. ]] (I used np.set_printoptions(precision = 3) to generate this formatting.) In addition to outputing the final image (the combination of original and shaded images), the program should output, accurate to three decimal places, nine values of the multiplier. These are at the Cartesian product of rows (0, M//2, M − 1) and columns (0, N//2, N − 1) (where // indicates integer division). For example, my solution’s output for image mountain2.jpg with M = 1080 and N = 1920 and direction ’center’ is (0,0) 0.000 (0,960) 0.510 (0,1919) 0.001 (540,0) 0.128 (540,960) 1.000 (540,1919) 0.129 (1079,0) 0.000 (1079,960) 0.511 (1079,1919) 0.001 These values are the only printed output required from your program. Important Notes: (a) Start by generating a 2d array of pixel distances in the row dimension and a second 2d array of pixel distances in the column dimension, then combine these using NumPy operators and universal functions, ending with normalization so that the maximum distance is 1. The generation of distance arrays starts with np.arange to create one distance dimension and then extends it to two dimensions np.tile. For example, >>> import numpy as np >>> a = np.arange(5) >>> np.tile(a, (3,1)) array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]) After you have the distance array, simply subtract the array from 1 to get the multipliers. (b) Please do not use np.fromfunction to generate the multiplier array because it is essentially the same as nested for loops over the image with a Python call at each location. 6 (c) Please use (M // 2, N // 2) as the center pixel of the image. 4. (20 points) How do you decide how similar two images are to each other? This question is at the heart of the recognition problem that pervades computer vision, and therefore it has been studied for years. Here we will consider a simple method that is a precursor to more sophisticated methods we will see later in the semester. Your script will read in each image in a directory. It will reduce each image to a vector of length 3n 2 . It will then find the distance between each pair of images. For each image, in the order produced by sort, it must find the closest image, and then it must output the two images and the distance, accurate to 3 decimal places. To encode an image in a vector — often called a descriptor vector — we divide the image into n × n regions that are equal in size (perhaps differing by one pixel). Use the same method as you did in Problem 2 when creating the downsized image. In each region, compute the average red, green and blue intensities. Concatenate these in row-major order to form the descriptor. In other words, if ri,j , gi,j , bi,j are the average RGB values from region i, j (here i represents rows and j represents columns), then the vector should be formed as r0,0, g0,0, b0,0, r0,1, g0,1, b0,1, . . . r0,n−1, g0,n−1, b0,n−1, r1,0, g1,0, b1,0, . . . Finally, normalize this vector (use np.linalg.norm) so that its magnitude is 1.0, and then scale all values by 100. The normalization step is intended to correct for brightness differences between images, while the 100 scaling converts to percentages to make the values more intuitive. Call the result the RGB descriptor. Output the final values of r0,0, g0,0, b0,0 and rn−1,n−1, gn−1,n−1, bn−1,n−1 for the first image. The command line for your program should be python p4_closest.py img-folder n where img-folder is the file folder containing the images (only consider files whose lower-case extension is .jpg) and n is the number of regions in the row and column dimensions. Each time the program is run, it should use both the RGB and the L*a*b descriptors, generating two sets of output. All numerical output should be accurate to 2 decimal points. Here is an example based on four images that will be distributed with the assignment. Nearest distances First region: 20.281 21.207 22.185 Last region: 6.762 6.497 6.520 central_park.jpg to skyline.jpg: 21.65 hop.jpg to times_square.jpg: 24.17 skyline.jpg to central_park.jpg: 21.65 times_square.jpg to hop.jpg: 24.17 In this example, there is symmetry in the closest distances. This will not always be the case.
Implement the following 2d algorithms using HTML Canvas/Javascript. You cannot use direct Canvas primitives—assume you can draw a single point and develop these algorithms. • DDALine • MidpointLine – handle all slopes • MidpointCircle • MidpointEllipse There should be buttons to select the algorithm/shape. There should be text boxes to accept (x1, y1) and (x2, y2) for line; and (x,y) and r for circle Handle all slopes/quadrants—supplied code does not Draw the same shapes using Canvas primitives and compare if they are identical or not; analyze and write in your report. Sample code may be found at http://www.cs.uml.edu/~kseethar/Spring2020/programs/p5/ Deliverables • Source files • Sample Input/output • 1 page report : Write about issues faced, lessons learned, any remaining bugs etc. Extra Credit • any other functionality …. – please document in report and code. Deadline and Late Submissions • The assignment is due on the date specified above at 11:59:59 PM • Each day late will incur a penalty of 5% of the grade for the assignment; for example, if the assignment is 3 days late, the maximum grade will be 85 out of 100—15 will be subtracted from whatever grade is assigned.
Implement the following 2d transformations using HTML Canvas/Javascript. You can use Canvas primitives to draw shapes only. The transformation should be applied using rubber banding. Define appropriate event handlers to do the transformations and rubberbanding. • Translation • Scaling • Rotation Apply the transformations to the following shapes: • Line • Circle • Rectangle • Triangle • Polygon Deliverables • Source files • Sample Input/output • 1 page report : Write about issues faced, lessons learned, any remaining bugs etc. Extra Credit • any other functionality …. – please document in report and code. Deadline and Late Submissions • The assignment is due on the date specified above at 11:59:59 PM • Each day late will incur a penalty of 5% of the grade for the assignment; for example, if the assignment is 3 days late, the maximum grade will be 85 out of 100—15 will be subtracted from whatever grade is assigned.
This assignment will extend the assignment 1 by adding the following input controls and callback handlers: • number of points through a slider • color thro either rgb text boxes or color picker • buttons to perform specific actions • display status as processing happens Here is a mockup—you are free to design your own interface: use the gasket1.html and gasket1.js which are explained in Ch 2. Using these files as the baseline, do following • move all JS code out of html file • draw following in continuous loop—say 10 times ◦ change color—each iteration will be different color ◦ change size of image—large to small in steps and back to original size ◦ vary number of points in steps of 500-5000 Files may be found in http://cs.uml.edu/~kseethar/Spring20 20 /programs/p1/ Deliverables • Source files • Sample Input/output • 1 page report : Write about issues faced, lessons learned, any remaining bugs etc. Extra Credit • any other functionality …. – please document in report and code. Deadline and Late Submissions • The assignment is due on the date specified above at 11:59:59 PM • Each day late will incur a penalty of 5% of the grade for the assignment; for example, if the assignment is 3 days late, the maximum grade will be 85 out of 100—15 will be subtracted from whatever grade is assigned.
WebGL Introduction This assignment will use the gasket1.html and gasket1.js which are explained in Ch 2. Using these files as the baseline, do following • move all JS code out of html file • draw following in continuous loop—say 10 times ◦ change color—each iteration will be different color ◦ change size of image—large to small in steps and back to original size ◦ vary number of points in steps of 500-5000 Files may be found in http://cs.uml.edu/~kseethar/Spring2020/programs/p1/ Deliverables • Source files • Sample Input/output • 1 page report : Write about issues faced, lessons learned, any remaining bugs etc. Extra Credit • any other functionality …. – please document in report and code. Deadline and Late Submissions • The assignment is due on the date specified above at 11:59:59 PM • Each day late will incur a penalty of 5% of the grade for the assignment; for example, if the assignment is 3 days late, the maximum grade will be 85 out of 100—15 will be subtracted from whatever grade is assigned.
Spring 2025Consider the popular game (at least at the time of this writing) of Wordle. A 5-letter English word is chosen at random (though we will extend this to support n-letter words) and the user has some number of chances to guess the word. As the user guesses words, letters in their guess that match the selected word in the correct location are highlighted in green, while letters (which may include duplicates) that are part of the word but not in the correct location are highlighted in yellow. So, if the selected word was total and the user guessed treat, the first t and a in treat would be highlighted in green to indicate they are in the correct location and the last t would be marked in yellow to indicate that there is another t in the select word but not at the correct location, and all other letters (such as r) would appear in gray to indicate they are not found in the word. In this program, you will write a tool to aide players of a Wordle-like game (it does not have the exact semantics of Wordle, but is similar). Suppose the user would like to know what English-language words exist that meet certain criteria (so they can think of potential guesses). The user will provide you an input of how many letters the word is, what letters have already been guessed in their correct location (akin to the green letters), and a list of letters that they know MUST appear in the word (akin to the yellow letters, but without indicating where those letters CANNOT be). They will NOT provide the list of gray letters, meaning you will need to try all possible letters when trying to fill in a blank location in the word. So your task will be to write a function (and any desired helper functions), to compute all possible English-language words that exist given a string containing the already-guessed letters that are in the correct location (we will refer to these as fixed letters since they may not change position and must stay in that location) and a string of the already-guessed letters that are MUST appear in the word somewhere (we will refer to these as floating letters since their position may be in any of the remaining locations). Again, we are not actually writing code to play the game but code that can help players consider potential future guesses given some of the knowledge they’ve learned from previous guesses. Your approach should be to generate all possible strings of lower-case, alpha characters that contain the fixed haracters (in their given location), floating characters (in any locations), and fill in any remaining letters with all possible options and then check if any of those strings are true English-language words. We will provide a std::set of all possible English-language words (i.e. essentially a dictionary) that you may use for this task (which we read from a provided file: dict-eng.txt). The signature of the function you are to write must be (and cannot be changed): std::set wordle( const std::string& in, const std::string& floating, const std::set& dict); Note: We realize that this may be an inefficient approach to finding all the words that match the given constraints. If we were not forcing you to use this approach, it may be easier to find the words that start with a given letter and iterate through them in the sorted order that the set provides checking to see if the word matches the given constraints. However, one can see that if the first letter or two are unknown, then this may require iterating through the whole dictionary, and so, in some cases, may be faster to generate all the possible strings, as we do here. Input format To indicate how many letters the selected word has AND the correct, fixed location letters already guessed, we will use a string of length n (for an n-letter word), using the – character to indicate a blank location and characters filling in the correct, fixed locations. So the input string -i- means we want to find all 3 letter words that have an i as the middle character. We will then pass in a second string of the floating (yellow) characters that contains the characters (in any order) that we know must be part of the selected word but whose location is unknown. So if you were given a first input string of -i– and second input string dn, you should output all words that have i as the 2nd letter and contain d and n. A sample output is below. Since the results are stored in a set, they can be printed in any order. We have provided a “driver”/test program (wordle-driver.cpp) that will take in these two strings from the command line, call your function, and then print out the results from the set of strings returned. Program execution and command line arguments. ./wordle-driver -i– dn Printed results: bind dine ding dins dint find hind kind mind rind wind Use wordle-driver to do some sanity tests of your code before moving on to any of the tests from our grading suite. Note: To discourage any attempt to hard-code or game the system, the instructor may run additional tests after submission that were not provided, though they will be similar in format and content. Requirements and Assumptions ● As always you may not change the signature of the primary function provided. ● You MAY define helper functions in wordle.cpp. ● You MUST use a recursive approach to find all combinations of letters to form the length-n word. Failure to do so will lead to a 0 on this part of the assignment. ○ Really you should only have 1 or 2 loops to help set the characters in any given location, and maybe 1 to 2 other loops to help with various constraint checks, etc. But to ensure you do not somehow brute-force the solutions, you may use at most 5 loops in your entire implementation in wordle.cpp. ● You may NOT use any functions from the algorithm library (nor should you really need to). ● You do NOT need to implement any kind of backtracking approach where you attempt to determine if a string can possibly be an valid English-language word as you are forming it. Instead, just generate all possible strings and check if each word is in the dictionary once it has the full n letters. Hints and Approach Recall that when generating all combinations you use recursion to build up a combination 1 “place” at a time, with each recursion responsible for 1 “place/location” of the combination. For that place, each recursion should try all the viable “options”, recursing after each one, and, upon return, undo and try the next option if a solution has not been found (or if multiple solutions are desired). Think carefully about what “options” should be tried at each location. Can any letter be used at any open place? What limitaiton do the floating letters provide? By generating all combinations of n-length words that meet the restrictions given by the fixed and floating characters, it becomes simple to check whether the combination is a valid English-language word once the combination has its full n letters. Next3. Schedwork Suppose a company needs to schedule d (which stands for needed) out of k (possible) workers (e.g. nurses at a hospital) per day given their availability over an n day period. The company requires exactly d workers (nurses) per day but does not allow any individual to work more than m (maxShifts) shifts over the n day period. Given d, m, and the k workers’ availability for each of the n days, write a backtracking search function to schedule the nurses, such that exactly d work on each of the n days and no one works more than m shifts. The prototype for the function you will write is given below, along with some typedefs for the input and output data structures. // type for the ID of a worker typedef unsigned int Worker_T; // n-by-k Matrix of each of the k workers’ availability over an n-day period typedef std::vector AvailabilityMatrix; // n-by-d matrix with the d worker IDs who are scheduled // to work on each of the n days typedef std::vector DailySchedule; /** * @brief Produces a work schedule given worker availability, * the number of needed workers per day, and the maximum * shifts any single worker is allowed. Returns true if * and the valid schedule if a solution exists, and false * otherwise. * * @param [in] avail n x k vector of vectors (i.e. matrix) of the availability * of the k workers for each of the n days * @param [in] dailyNeed Number of workers needed per day (aka d) * @param [in] maxShifts Maximum shifts any worker is allowed over * the n day period (aka m) * @param [out] sched n x d vector of vectors indicating the d workers * who are scheduled to work on each of the n days * @return true if a solution exists; sched contains the solution * @return false if no solution exists; sched is undefined (can be anything) */ bool schedule( const AvailabilityMatrix& avail, const size_t dailyNeed, const size_t maxShifts, DailySchedule& sched ); Explanation The avail matrix is an n row by k column matrix of booleans. Each row represents one day of the schedule and each column is one worker’s availability to work on each day. W W W W o o o o r r r r k k k k e e e e r r r r 0 1 2 3 | | | | | | | | V V V V Day 0 –> 1 1 1 1 Day 1 –> 1 0 1 0 Day 2 –> 1 1 0 1 Day 3 –> 1 0 0 1 So we see in the avail matrix above that every worker is available to work on day 0 while only worker 0 and 2 are available on day 1, and so on. You should produce a schedule solution that is an n by d matrix, where each row represents a day in the schedule and stores the d IDs of the workers who have been scheduled to work that day (each entry must thus be a value in the range 0 to k-1). So given the availability matrix above and inputs of m=2 and d=2, a valid schedule results would be: 1 2 0 2 1 3 0 3 The above indicates that on day 0 (top row), worker 1 and 2 will be scheduled. Then on day 1 (next row), worker 0 and 2 will be scheduled and so on. You can verify with the availability matrix that all workers are available on their scheduled days and no worker works more than m=2 shifts. Testing We have provided a “driver”/test program (schedwork-driver.cpp) where you can alter an availability matrix and values for d (dailyNeed) and m (maxShifts) and then call your algorithm and print the results. Use schedwork-driver to do some sanity tests of your code before moving on to any of the tests from our grading suite. Requirements and Assumptions ● As always you may not change the signature of the primary function provided. ● You MAY define helper functions in schedworker.cpp ● You MUST use a recursive approach that follows the general backtracking structure presentedin class. Failure to use such a recursive approach will lead to a 0 on this part of the assignment. ● You MAY use functions from the algorithm library such as std::find, if you desire. ● The order in which you list the worker IDs in each row of the final schedule doesn’t matter (i.e. if Worker 1, 2, 3 is scheduled to work on a given day, then 3, 2, 1 is also acceptable). ● You may have up to three loops in your code: two for setup and one in your recursive search. Hints and Approach Recall that a backtracking search algorithm is a recursive algorithm that is similar to generating all combinations, but skipping the recursion and moving on to another option if the current option violates any of the constraints. It is likely easiest to recurse over each place in the schedule (i.e. the 2D sched matrix). Each recursive call would be responsible for filling in one of the n*d schedule openings, ensuring the constraints of availability and the maximum number of shifts allowed for each worker is met. If you have already done a lab regarding backtracking search, it would likely be beneficial to look it over. Next4. Submission Grading There are 26 tests between wordle_tests and schedwork_tests. No valgrind tests this time (you don’t need to dynamically allocate anything on this assignment).
The core of this assignment is to develop, train, test and evaluate a system that takes as input an image and its region proposals and produces as output a set of detected objects including their class labels and bounding boxes. At the heart of this is a neural network that takes a region proposal as input and predicts both a classification decision about the region and the bounding box for the region. The region proposals are generated for you. Once the predictions are made by the network for all proposals, the results must be post-processed to produce the actual detections. The classification decisions include the nothing class, which means there is no object in the region proposal. In this case, no bounding box will be extracted from your network. The possible classes are provided with the initial Python code we are giving you. Your goal is to maximize the mean Average Precision of the detections across the images. Importantly, you will not need to implement the network from scratch like you did for HW 5. Instead you can use a pretrained (”backbone”) network — in particular ResNet-18 — and add the classification and bounding box regression layers on top of it. Starter python code with the Dataset subclass, for designing your network, and for the postprocessing and evaluation steps are provided for you in the zip file distributed with the assignment. Training and Validation Input The input for training and for validation will be managed by the HW6Dataset class, a subclass of PyTorch’s Dataset. It implements the __init__, __len__ and __getitem__ methods, as discussed in the Lecture 19 notes. Each call to the latter returns a cropped candidate image region, the ground truth bounding box, and the ground truth label. Here are some details: • All regions will be the same size, 224×224. The regions will be cropped and resized from the image before they are given to you. Multiple regions will be pulled from each image. Forcing them to be a square will distort the region. Each region will be a 3x224x224 tensor. • The ground truth bounding box will be a 1×4 tensor of floats giving the upper left and lower right corners of the bounding box in the coordinate system of the rectangle. Importantly, these will be specified as fractions of the region width and height. For example,the tensor (0.1, 0.25, 0.8, 0.95) is the bounding box (22.4, 56., 179.2, 212.8). • The ground truth label is an integer, with the value 0 indicating that there is no correct label for this region. This is the ”nothing” class. Your data set object will be passed to a Dataloader which will return minibatches of 4-dimensional tensors of size Bx3x224x224, where B is the size of the minibatch. See Lecture 19 and the demo example. We have provided you with two different subsets of the data sets. The first involves only four classes (plus the nothing class) here: 1 https://drive.google.com/drive/folders/1uRz6BFNPGBqzfyxJGwbBE9eiwq-bMzTK?usp=sharing This is the dataset where you should focus your effort. The second, larger data set involves all twenty classes here: https://drive.google.com/drive/folders/1Xgkf3QBdJAEz2vkBl4uy029–OPLoNqm?usp=sharing You can experiment with the larger dataset for a small amount of extra credit once you have good results on the smaller one. Your Network Your network should consist of the following: 1. A pre-trained backbone network from which one or more of the last layers have been removed. We provide you a starter HW6Model class which already has this pre-trained backbone using torchvision.models.resnet18. We have removed the last fully-connected layer, meaning the last layer of the backbone is a global average pooling layer. See the final comments section at the end of these instructions for more information on this. This backbone will be frozen, meaning you will not be training its weights. You can choose to unfreeze the backbone (fully or partially) and potentially achieve better results, but your model will take longer to train. 2. Add a classification layer that is fully-connected to the last remaining layer of the backbone network. If there are C classes, this classification layer should have C + 1 neurons, where class index 0 corresponds to the label nothing. 3. A regression layer that predicts 4C outputs, four for each non-zero class index. This regression layer should be fully connected to the last remaining layer from the backbone network. The input to the network is a 4-dimensional tensor of candidate regions extracted from the images. The outputs from the network are the associated activation values for each input candidate region plus the 4C values estimated for the upper left and lower right corners of the bounding rectangles for each candidate region for each class. When you calculate your bounding box regression loss, use the ground truth labels to determine which one of the C bounding boxes will contribute to the loss. You will use a combined loss function for the classification and regression layers when training, as was discussed during Lecture 21 and is in the hand-written notes. You might consider adding a hidden layer to the classification network or to the regression network, but don’t do so right away. You are welcome to use any optimization method you wish, or you can stick with stochastic gradient descent. Finally, note that even though the B candidate regions in a minibatch are passed through the network together, they will each be evaluated independently. This is very different from what happens with the test input after the network has been trained. The Validation Step Validation input will be in exactly the same form as the training input. Computing the validation loss is straight forward. You can also compute the accuracy. This will require that you count the number of correct decisions the network makes on the validation candidate regions. A decision is correct if the neuron with the highest activation corresponds to the ground truth label and, when the correct label is not 0, the IOU between the ground truth bounding box and the predicted bounding box is greater than 0.5. 2 You should evaluate your model on the validation data after every epoch. This will give you an idea of how well your model works on data outside your training set, and you should choose the model parameters which give you the best validation performance. One of these parameters is the number of epochs you train for, and if you find that training for a certain number of epochs results in better performance you should train for that long for your final model. After you train the best model you can based on the validation data, then proceed to the final testing steps. Input of Testing Data Input of the test data is managed by HW6DatasetTest, and is organized by images. The call to the __getitem__ method will return an image (a numpy array) together with all of its candidate image regions (each normalized to a 3x224x224 tensor), candidate bounding boxes, ground truth bounding boxes, and ground truth labels. Unlike during training, all bounding boxes are provided in the coordinate system of the original image. At test time, the candidate image regions will be fed into the network as a four-dimensional tensor in just the same way as a minibatch is during training. The post-processing to evaluate the results is very different however. Post-Processing — Only At Test Time The post-processing proceeds one image at a time. This is not done during training and validation because these steps stop with the evaluation of each prediction independently. At test time, however, and for what would be the actual use of the network, the predictions for each region must be combined and filtered down to form the actual ”detections” made by the system. As an initial step for doing this, the ground truth and predicted coordinates — at least for the detections that aren’t ”nothing” — must be converted back to image coordinates. To process the list of prediction regions for an image, the first step is to eliminate regions whose class is 0 (”nothing”); these are ”non-detections”. Then, order the remaining regions by decreasing activation. Finally, apply the non-maximum suppression algorithm described in Lecture 21, using an IOU of 0.5 as the threshold to decide if two regions overlap (in image coordinates). Importantly, two regions should only be compared for non-maximum suppression if they have the same class label. The result of the non-maximum suppression is the set of actual (”predicted”) detections. Evaluation For each test image, after this post-processing step the predicted detections must be compared to the ground truth detections to determine which are correct and to calculate the AP for each image and the mAP over all images. An intermediate goal here is to form the binary vector b for the computation of the AP for each image. For each predicted detection di in decreasing order of activation, compare di to the ground truth regions for the image and find all that have the same label as di . Among the ground truth regions having the same label as di , find the region having the highest IOU with di . Call this region g. If no ground truth regions have the same label as di or if IOU(di , g) < 0.5 then di is an incorrect detection, and a 0 should be appended to the binary decision vector b (see formulation of mean Average Precision from class.) Otherwise, di is a correct detection, and a 1 should be appended to b. Finally, if di is correct then g must be removed from the list of ground truth regions so that it does not get matched more than once. 3 Repeat the foregoing process for each of your detected regions for a given image, entering a 0 or 1 into b for each di . Consider up to at most 10 total regions. (If there are more than 10 ground truth results, only count C = 10 in the AP calculation.) From this, compute the average precision (AP) score at 10 detections for this image. If your network had only j < 10 regions dj , then you will make decisions about detection d0 through dj−1 and enter them into b, and fill the remaining entries of b with 0’s. Once you have b for a given image, you can calculate the average precision (AP) score for this image and its detections, just as we did in the Lecture 20 exercise. Ideally, at the end of the computation, each ground truth region will be matched to a detection, but those that aren’t are considered ”misses” or ”false negatives”. Finally, averaging the AP score over all input images gives your final mean average precision (mAP). What You Need to Do There are several coding parts to this assignment: 1. Forming your network based on ResNet18, including its loss functions. 2. Writing your training and testing functions. 3. Writing the code for the test evaluation. Starting with the last part, we have provided you with a template code called evaluation.py that outlines and tests the post-processing and evaluation code you need to write. Your completed version of this file will be uploaded separately to Submitty and automatically tested. We strongly suggest that you do this first. You will then use this code to help evaluate your test results. Output from the system should be different for training / validation and for testing. For training and validation the output should show: • A plot of the training and validation losses as a function of the epoch. • A confusion matrix on the classification decisions for training and validation at the end of training. • The average IOU between the predicted and correct bounding rectangles, only for the regions where the classifier is correct. During testing, detailed output from the network should be given for test images 1, 16, 28, 30, 39, 75, 78, 88, 93, and 146: • The original image with the bounding boxes of the final regions drawn on top. Each bounding box should have text giving the class label. Detections that are correct (IOU with a ground truth bounding box having the same class label is at least 0.5) should be shown in green. Detections that are incorrect should be shown in red. Ground truth detections that are missed should be shown in yellow. All of the necessary information is produced by the predictions_to_detections and evaluate functions that you need to write in evaluate.py. • The average precision for the image. The final output should show • The overall mean average precision across all test images. 4 • For each class, the percentage of detections of that class that are true positives and the percentage of ground truth detections that are found. Again, this information can be gathered using the results of the evaluate function that you need to write in evaluate.py. Submission In addition to the Submitty upload of evaluate.py as described above, please submit a zip file containing all your code and a pdf gathering all your results. The pdf file should include • An explanation of your neural network model and any variations you tested. • The training output described above • The testing output described above • A summary written assessment of your results. Final Comments • On AiMOS each training epoch should take just a few minutes, and faster if you use multiple workers. If your code is much slower than this there are significant inefficiencies. For instance during training you should only be using for loops over the epochs and dataloaders. While it may be tempting to compute loss with a for loop, it may result in the inability to train in a reasonable amount of time. • We strongly urge you to first test your ability to simply predict the correct region for each candidate region, without regard for the bounding boxes. If this is as far as you get you will still get at least 75% credit on the network part of this assignment. • If you get good results on the full data set you can earn up to 10 points extra credit for the assignment. • We found similar bounding box regression results between using the global average pooling layer of ResNet and instead concatenating all of its features. You can try both approaches, but the default method would be to use the output of the global average pooling layer (512-dim) as input to your bounding box regression. 5
The focus of this assignment is scene classification. In particular you will write SVM and neural network classifiers to determine the “background” class of a scene. The five possibilities we will consider are grass, wheat field, road, ocean and red carpet. Some of these are relatively easy, but others are hard. A zip file containing the images can be found at https://drive.google.com/file/d/1nrALsQXsU6KoPAMCz65H0S20N_mRvixZ/view?usp=sharing The images are divided into three sets: training, validation and test. Training images are used directly to optimize the learned weights of a model. The validation set is used to tune the parameters that determine the model that is trained. These parameters are things like (a) the number of layers of a neural network, (b) its learning rate, (c) when to stop training, and (d) the parameter controlling the tradeoff betwen the margin and the slack values for a linear SVM. Once the parameters of the model are set and the model is trained, the model is applied to the test data just once to see how well it ultimately works. Here’s a little more about the use of the validation data. Think of it as part of an outer loop of training where (a) the model parameters are set, (b) the model is trained, (c) the trained model is applied to the validation data to determine model accuracy, (d) if the resulting model is more accurate than any previously trained model, the model parameters and trained weights are saved, (e) new model parameters are set, and the process is repeated. Classifiers to Build In Problem 1 you will implement a descriptor and a linear SVM to classify the scene, while in Problem 2 you will implement two neural networks. 1. (60 points) In Lecture 14, which covered detection and SVMs, we focused on the “HoG” — histogram of oriented gradients — descriptor. After this method was published, many different types of descriptors were invented for many applications. For this problem you are going to implement a descriptor that combines location and color. It will include no gradient information. One long descriptor vector will be computed for each image, and then a series of SVM classifiers will be applied to the descriptor vector to make a decision. We strongly urge you to write two scripts to solve this problem, one to compute and save the descriptor for each image, and one to reload the descriptors, train the classifiers, and evaluate the performance. The descriptor is formed from an image by (a) computing a 3d color histogram in each of a series of overlapping subimages, (b) unraveling each histogram into a vector (1d NumPy array), and (c) concatenating the resulting vectors into one long vector. The key parameter here will be the value of t, the number of histogram bins in each of the three color dimensions. 1 (a) (b) Figure 1: Color histogram bins Fortunately, calculation of color histograms is straightforward. Here is example code to form an image of random values and compute its 3d color histogram: import numpy as np # Generate a small image of random values. M, N = 20, 24 img = np.floor(256 * np.random.random(M * N * 3)).astype(np.uint8) # Calculate and print the 3d histogram. t = 4 pixels = img.reshape(M*N, 3) hist, _ = np.histogramdd(pixels, (t, t, t)) print(’histogram shape:’, hist.shape) # should be t, t, t print(’histogram:’, hist) print(np.sum(hist)) # should sum to M*N Each histogram bin covers a range of u = 256/t gray levels in each color dimension. For example, hist[i, j, k] is the number of pixels in the random image whose red value is in the range [i ∗ u,(i+ 1) ∗ u) and whose green value is in the range [j ∗ u,(j + 1) ∗ u) and whose blue value is in the range [k ∗ u,(k + 1) ∗ u). Figure 1 illustrates the partitioning of the RGB color cube into 4 × 4 × 4 regions. As mentioned above, the image will be divided into overlapping subimage blocks and one color histogram size of t 3 will be computed in each block. These will be concatenated to form the final histogram. Let bw be the number of subimage blocks across the image and let bh be the number of blocks going down the image. This will produce bw · bh blocks overall, and a final descriptor vector of size t 3 · bw · bh. To compute the blocks in an image of H × W pixels, let ∆w = W bw + 1 and ∆h = H bh + 1 . 2 Original image with ∆w and ∆h spaced lines Blocks of pixels over which histograms are formed Figure 2: Image block tiling for bw = 4 and bh = 2. The blocks will each cover 2∆w columns and 2∆h rows of pixels. The image pixel positions of the upper left corners of the blocks will be (m∆w, n∆h), for m = 0, . . . , bw − 1, n = 0, . . . , bh − 1. Note that some pixels will contribute to only one histogram, some will contribute to two, and others will contribute four. (The same is true of the HoG descriptor.) Figure 2 illustrates the formation of blocks. After you have computed the descriptors, you will train a series of SVM classifiers, one for each class. To do so, you will be given a set of 4000 training images, {Ii}, with class labels yi ∈ (1, . . . , k) (for us, k = 5). To train classifier Cj , images with label yi = j are treated as yi = +1 in linear SVM training and images with label yi 6= j are treated as yi = −1. This will be repeated for each of the k classifiers. The descriptor is computed for each training image Ii to form the data vectors xi . Each resulting classifier Cj will have a weight vector wj and offset bj . The score for classifier j for a test image with descriptor vector x is dj = 1 kwjkw> j x + bj. (Recall that the 1/kwjk ensures that dj is a signed distance.) The classification for the test image I is the class associated with the value of j that gives the maximum dj score. This is used even if none of the dj scores are positive. For each SVM classifier, output simple statistics on the validation steps you use to set the model parameters. This should likely just be the multiplier that controls tradeoff between margin and slack values. Note that a different value of the multiplier is allowed for each of the five SVMs you train. Reasonable ranges for this parameter are [0.1, 10] After you complete training, you will test your classifiers with the set of 750 test images. Each will be run as described above and the label will be compared to the known correct label. You will output the percentage correct for each category, followed by a k × k confusion matrix. The confusion matrix entry at row r and column c shows the number of times when r 3 was the correct class label and c was the chosen class label. The confusion matrix would have all 0’s in the non-diagonal entries when the SVM classifier is operating at 100% accuracy. Some Details (a) It is very important that you use histogramdd or a similar function to compute the histogram. Do not iterate over the pixels in the image using for loops: you will have serious trouble handling the volume of images. You can iterate over the indices of the subimage blocks and then use Numpy calls as above to form the histograms. (b) As mentioned above, we suggest that you write one script to compute and save the descriptors for all training and test images, and then write a second script to train your SVM classifiers and generate your test output. We suggest that you use Python’s pickle module. (c) For your SVM implementation, we suggest using sklearn.svm.LinearSVC. To use the scikit-learn (sklearn) Python module, you will need to install the package. If you are using Anaconda, as suggested earlier in the semester, you can simply run conda install scikit-learn (d) The confusion matrix can be made using Matplotlib or sklearn.metrics.confusion_matrix (e) The computation of feature extraction might still be time consuming even with efficient Numpy use. We suggest that you develop and debug your program using a subset of the training image set before running your final version on the full training and test sets. (f) We suggest using at a minimum t = 4 and bw = bh = 4. This will give a descriptor of size 1,024 per image. (g) Finally, you can investigate the addition of gradient histograms to your descriptors. Doing so, with a careful analysis of the impacts, can earn you up to 5 points extra credit. Submit your code, your output showing your validation and training experiments, your final test results, and a write-up describing design choices, your validation steps and a summary discussion of when your classifier works well, when it works poorly, and why. Rubric: • [8 pts]: Clear and reasonably efficient code • [10 pts]: Descriptor formed, including choices of parameters • [8 pts]: Training of SVM including hyperparameter • [8 pts]: Output of confusion matrices • [10 pts]: Reasonably good results • [8 pts]: Clear overall explanation • [8 pts]: Good explanation of successes and failures 2. (60 points) We continue the background classification problem using neural networks. Specifically, you will use pytorch to implement two neural networks, one using only fully-connected layers, and the other using convolutional layers in addition to fully-connected layers. The networks will each start directly with the input images so you will not need to write or use 4 any code to do manual preprocessing or descriptor formation. Therefore, once you understand how to use pytorch the code you actually write will in fact be quite short. Your output should be similar in style to your output from Problem 1. Make sure that your write-up includes a discussion of the design of your networks and your validation steps. To help you get started I’ve provided a Jupyter notebook (on the Submitty site) that illustrates some of the main concepts of pytorch, starting with Tensors and Variables and proceeding to networks, loss functions, and optimizers. This also includes pointers to tutorials on pytorch.org. This notebook will be discussed in class on March 29 and April 1. If you already know TensorFlow, you’ll find pytorch quite straightforward. Two side notes: (a) PyTorch includes some excellent tools for uploading and transforming images into Tensor objects. For this assignment, you will not need to use these since you’ve already written code for image input for the previous problem that gathers images into numpy arrays and it is trivial to transform these objects into pytorch tensors. On the other hand, we will be discussing these tools — DataLoader and Dataset — in class and you are welcome to use them. (b) Because of the size of the images, you might find it quite computationally expensive and tedious to use a fully-connected network on the full-sized images. Therefore, you are welcome to resize the images before input to your first network. In fact, we strongly suggest that you start with significantly downsized images first and then see how far you can scale up! Rubric: Grading: 12 pts Quality of results with FC network 6 pts Output of FC confusion matrix. 6 pts Discussion of FC design choices 6 pts Overall discussion / presentation of FC results 12 pts Quality of results with ConvNet network 6 pts Output of ConvNet confusion matrix. 6 pts Discussion of ConvNet design choices 6 pts Overall discussion / presentation of ConvNet results Access to a GPU and to AIMOS For your experiments it will be helpful to have access to a GPU. Many of your computers have them, but there are other possibilities. The simplest one is to use Google Colaboratory, as demonstrated by the Jupyter notebook distributed for lecture. As announced on Submitty, there is also the possibility of using the RPI AIMOS super computer https://cci.rpi.edu/aimos.