5/5 - (1 vote) Program Goals Deepen understanding of state space generation Practice implementation of search algorithms Summary We’ve introduced the 8-queens problem in class: place 8 queens on a chess board such that no queen is attacking any other queen. A solution for an 8×8 board with state [2, 7, 3, 6, 0, 5, 1, 4] Recall that in the game of chess, queens attack any piece they can “see” (i.e., there is nothing between the queen and the attacked piece) in the same row, column, or diagonal. In this assignment, you will generalize this problem to a board of arbitrary (square) size and equivalent number of queens, and add a “boulder” to the board. This boulder has the following properties: It blocks the space — no queen may occupy the same space as the boulder It blocks queen attacks — two queens on opposite sides of the boulder are not considered to be attacking each other Given the size of the board (N > 0) and location of the boulder (0 > succ([1,1,2],0,0) => [[2, 1, 2], [1, 0, 2], [1, 2, 2], [1, 1, 0], [1, 1, 1]] Evaluate a State As in class, we’ll define our f() to be the number of queens which are being attacked. Remember to account for the boulder in this calculation! Queens cannot attack through the boulder. Given n=3 and a boulder at (1,1), here are some example f() values: [1, 2, 2] - f=3 [2, 2, 2] - f=3 [0, 0, 2] - f=2 >> choose_next([1,1,2],0,0) => [1, 0, 2] >>> choose_next([0,0,2],1,1) => None N-Queens Run the hill climbing algorithm as specified in class on a given initial state until convergence: that is, until your algorithm finds a local minimum and gets stuck (or solves the problem). Your printed output for this function should look like the black text below, followed by the returned minimum state in green: >>> nqueens([0,1,2,3,5,5,6,7], 4, 4) [0, 1, 2, 3, 5, 5, 6, 7] - f=8 [0, 1, 2, 3, 5, 0, 6, 7] - f=7 [0, 1, 2, 3, 5, 0, 4, 7] - f=5 [0, 1, 6, 3, 5, 0, 4, 7] - f=4 [0, 1, 6, 3, 5, 2, 4, 7] - f=3 [0, 4, 6, 3, 5, 2, 4, 7] - f=2 [0, 4, 1, 3, 5, 2, 4, 7] - f=2 => [0, 4, 1, 3, 5, 2, 4, 7] >>> nqueens([0,2,2,3,4,5,6,7], 1, 1) [0, 2, 2, 3, 4, 5, 6, 7] - f=7 [0, 2, 2, 3, 1, 5, 6, 7] - f=6 [0, 2, 2, 3, 1, 4, 6, 7] - f=5 [0, 2, 5, 3, 1, 4, 6, 7] - f=3 [0, 2, 5, 3, 1, 4, 6, 3] - f=3 [0, 2, 5, 1, 1, 4, 6, 3] - f=2 => [0, 2, 5, 1, 1, 4, 6, 3] Each step of the hill-climbing function should print its current state along with that state’s f() value, and ultimately return the best state you find. N-Queens with Random Restarts Generate a random (valid!) initial state for your n*n board, and use your nqueens() function on that state. If the convergent state does not solve the problem, generate a new random (valid!) initial state and try again. Try again up to k times. If you find a solution before you reach k, print the solution and terminate. If you reach k before finding a solution, print the best solution(s) in sorted order. Python’s random module will be helpful in generating random initial states. Each column’s coordinate should be uniformly distributed over the possible coordinates. To generate a single uniform integer between 0 (inclusive) and n (inclusive), the Python code is: import random n = random.randint(0,8) Be sure to account for the location of the boulder! If you happen to randomly generate a state where a queen is occupying the same space as the boulder, abandon that state and generate a new one.
5/5 - (1 vote) Program Goals Deepen understanding of state space generation Practice implementation of an efficient search algorithm Summary As we hinted at in class, this assignment will concern the 8-tile puzzle—with a twist. Recall the standard successors associated with a state in the typical 8-tile puzzle: In addition to these standard moves, we will allow tiles to wrap—that is, tiles can slide out of the grid vertically: or horizontally: and back into the puzzle on the opposite side. You can think of the rows and columns of the puzzle as rings, where a tile that goes out of one end automatically re-enters on the other end. In topology, we call a ring-shaped thing a “torus” (hence the name of the assignment). Given these rules for the puzzle, you will generate a state space and solve this puzzle using the A* search algorithm. Program Specification The code for this program should be written in Python, in a file called torus_puzzle.py. You may represent your states internally however you wish; a two-dimensional list or numpy matrix may be useful. We will provide states in a one-dimensional list of integers, with the empty space represented as 0. We will represent the initial state above in our input as [2,5,8,4,3,6,7,1,0] and the successors pictured are, in order, [2,5,8,4,3,0,7,1,6], [2,5,8,4,3,6,7,0,1], [2,5,0,4,3,6,7,1,8], and [2,5,8,4,3,6,0,1,7]. You are not required to use any particular libraries or packages for this assignment, nor are any libraries or packages prohibited. Any code that you do not personally write should be cited as you would cite a quotation in an essay; that is, with its author and source in as complete a format as possible. Code used without citation will be considered plagiarism and violates our policy of appropriate academic conduct. For example: ''' Original author: gottani Source: https://gottani.tumblr.com/post/181973941339 Links to an external site. The following function was translated from the original Javascript for the current program ''' def heuristic(state): Even if you modify the code, you should still cite your original inspiration (and any assisting TAs or peer mentors!). Goal State The goal state of the puzzle is [1, 2, 3, 4, 5, 6, 7, 8, 0], or visually: Heuristic Since we are using the A* search algorithm, we should agree on a heuristic. For simplicity, we will use the count of tiles which are not in their goal spaces. In the images provided in the summary, the h() value for the second and third states is 5 (tiles 4, 6, and 7 are in their goal locations). The first standard example and the horizontal wrapping example each have an h() value of 6, since we moved tiles out of their goal locations. Functions For this program you should write at least two (2) Python functions: print_succ(state) — given a state of the puzzle, represented as a single list of integers with a 0 in the empty space, print to the console all of the possible successor states solve(state) — given a state of the puzzle, perform the A* search algorithm and print the path from the current state to the goal state You may, of course, add any other functions you see fit, but these two functions must be present and work as described here. You must also use (and complete) the PriorityQueue classLinks to an external site. Download PriorityQueue classLinks to an external site.. We will use the methods provided there to test the intermediate steps of your implementation. Generate Successors This function should print out the four successor states of the initial state, as well as their heuristic value according to the function described above. >>> print_succ([1,2,3,4,5,0,6,7,8]) [1, 2, 0, 4, 5, 3, 6, 7, 8] h=4 [1, 2, 3, 0, 5, 4, 6, 7, 8] h=4 [1, 2, 3, 4, 0, 5, 6, 7, 8] h=4 [1, 2, 3, 4, 5, 8, 6, 7, 0] h=3 We do require that these be printed in a specific order: if you consider the state to be a nine-digit integer, the states should be sorted in ascending order. Conveniently, if you ask Python to sort one-dimensional arrays, it will adhere to this order by default; don’t do more work than you have to: >>> lists = [[1,2,3,4,5,8,6,7,0], ... [1,2,3,0,5,4,6,7,8], ... [1,2,3,4,0,5,6,7,8], ... [1,2,0,4,5,3,6,7,8]] >>> sorted(lists) => [[1, 2, 0, 4, 5, 3, 6, 7, 8], [1, 2, 3, 0, 5, 4, 6, 7, 8], [1, 2, 3, 4, 0, 5, 6, 7, 8], [1, 2, 3, 4, 5, 8, 6, 7, 0]] We strongly recommend that you create a separate helper function to create the list of successors of this function according to your internal representation, for use in the puzzle solver. Priority Queue Now is a good time to add our priority queue skeletonLinks to an external site. Download our priority queue skeletonLinks to an external site. to your code and complete the enqueue method. Everything we have provided so far should allow you to create a priority queue, add a dictionary to the queue, and see the contents of the queue: >>> pq = PriorityQueue() >>> pq.enqueue({'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}) >>> print(pq) {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3} If I pop from this priority queue, the following should happen: >>> pq.is_empty() => True And if I generate the successors of that state and add them to the priority queue, the following should happen (output will appear all on the same line; I’ve added line breaks here for readability): >>> # assume i have generated and enqueued the successors >>> print(pq) {'state': [1, 2, 0, 4, 5, 3, 6, 7, 8], 'h': 4, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 5} {'state': [1, 2, 3, 0, 5, 4, 6, 7, 8], 'h': 4, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 5} {'state': [1, 2, 3, 4, 0, 5, 6, 7, 8], 'h': 4, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 5} {'state': [1, 2, 3, 4, 5, 8, 6, 7, 0], 'h': 3, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 4 What you will need to do is complete the enqueue() method such that adding a state to the priority queue that is already present in the queue will not increase the size of the queue but rather update the cost (in our example and the lecture slides, g) to the lower of the two options. >>> pq.enqueue(''' the first child, but with a g of 0 and an f of 4 ''') >>> print(pq) {'state': [1, 2, 0, 4, 5, 3, 6, 7, 8], 'h': 4, 'g': 0, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 4} {'state': [1, 2, 3, 0, 5, 4, 6, 7, 8], 'h': 4, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 5} {'state': [1, 2, 3, 4, 0, 5, 6, 7, 8], 'h': 4, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 5} {'state': [1, 2, 3, 4, 5, 8, 6, 7, 0], 'h': 3, 'g': 1, 'parent': {'state': [1, 2, 3, 4, 5, 0, 6, 7, 8], 'h': 3, 'g': 0, 'parent': None, 'f': 3}, 'f': 4 Solve the Puzzle This function should print the solution path from the provided initial state to the goal state, along with the heuristic values of each intermediate state according to the function described above, and total moves taken to reach the state. Recall that our cost function g(n) is the total number of moves so far, and every valid successor has an additional cost of 1. >>> solve([4,3,8,5,1,6,7,2,0]) [4, 3, 8, 5, 1, 6, 7, 2, 0] h=6 moves: 0 [4, 3, 0, 5, 1, 6, 7, 2, 8] h=6 moves: 1 [4, 0, 3, 5, 1, 6, 7, 2, 8] h=5 moves: 2 [4, 1, 3, 5, 0, 6, 7, 2, 8] h=5 moves: 3 [4, 1, 3, 0, 5, 6, 7, 2, 8] h=4 moves: 4 [0, 1, 3, 4, 5, 6, 7, 2, 8] h=3 moves: 5 [1, 0, 3, 4, 5, 6, 7, 2, 8] h=2 moves: 6 [1, 2, 3, 4, 5, 6, 7, 0, 8] h=1 moves: 7 [1, 2, 3, 4, 5, 6, 7, 8, 0] h=0 moves: 8 Max queue length: 46 The max queue length is tracked by our priority queue implementation (as the variable max_len), and displaying it may be useful for debugging purposes, but it is not required output. Note: please do not use exit() or quit() to end your function when you find a path, as this will cause Python to close entirely. Additional Examples For your successor function: >>> print_succ([8,7,6,5,4,3,2,1,0]) [8, 7, 0, 5, 4, 3, 2, 1, 6] h=8 [8, 7, 6, 5, 4, 0, 2, 1, 3] h=8 [8, 7, 6, 5, 4, 3, 0, 1, 2] h=8 [8, 7, 6, 5, 4, 3, 2, 0, 1] h=8 For your solution function (be aware, that last one may take a while): >>> solve([1,2,3,4,5,6,7,0,8]) [1, 2, 3, 4, 5, 6, 7, 0, 8] h=1 moves: 0 [1, 2, 3, 4, 5, 6, 7, 8, 0] h=0 moves: 1 Max queue length: 4 >>> solve([8,7,6,5,4,3,2,1,0]) [8, 7, 6, 5, 4, 3, 2, 1, 0] h=8 moves: 0 [8, 7, 0, 5, 4, 3, 2, 1, 6] h=8 moves: 1 [8, 7, 3, 5, 4, 0, 2, 1, 6] h=7 moves: 2 [8, 7, 3, 0, 4, 5, 2, 1, 6] h=7 moves: 3 [8, 7, 3, 4, 0, 5, 2, 1, 6] h=6 moves: 4 [8, 0, 3, 4, 7, 5, 2, 1, 6] h=6 moves: 5 [8, 1, 3, 4, 7, 5, 2, 0, 6] h=6 moves: 6 [8, 1, 3, 4, 7, 5, 0, 2, 6] h=6 moves: 7 [0, 1, 3, 4, 7, 5, 8, 2, 6] h=6 moves: 8 [1, 0, 3, 4, 7, 5, 8, 2, 6] h=5 moves: 9 [1, 2, 3, 4, 7, 5, 8, 0, 6] h=4 moves: 10 [1, 2, 3, 4, 0, 5, 8, 7, 6] h=4 moves: 11 [1, 2, 3, 4, 5, 0, 8, 7, 6] h=3 moves: 12 [1, 2, 3, 4, 5, 6, 8, 7, 0] h=2 moves: 13 [1, 2, 3, 4, 5, 6, 0, 7, 8] h=2 moves: 14 [1, 2, 3, 4, 5, 6, 7, 0, 8] h=1 moves: 15 [1, 2, 3, 4, 5, 6, 7, 8, 0] h=0 moves: 16 Max queue length: 27944
5/5 - (1 vote) In 2004, engineers at Google introduced a new paradigm for large-scale parallel data processing known as MapReduce (the original paper is here. As a bonus, see that authors of the the citations at the end). MapReduce makes developers efficiently and easily program large-scale clusters, especially for tasks with a lot of data processing. With Map Reduce, the developer can focus on writing their task as a set of Map and Reduce functions, and let the underlying infrastructure handle parallelism/concurrency, machine crashes, and other complexities common within clusters of machines. In this project, you’ll be building a simplified version of MapReduce for just a single machine. While it may seem simple to build MapReduce for a single machine, you will still face numerous challenges, mostly in building the correct concurrency support. You’ll have to think about how to design your MapReduce implementation, and then build it to work efficiently and correctly. There are three specific objectives to this assignment: To gain an understanding of the fundamentals of the MapReduce paradigm. To implement a correct and efficient MapReduce framework using threads and related functions. To gain experience designing and developing concurrent code. Deliverables: Two files named mapreduce.c and mapreduce.h The files will be compiled with some test files. It should compile successfully when compiled with the -Wall and -Werror flags Administrivia: This project may be performed in groups of 2. Due Date: Aug 9th at 11:59PM (slip day policy will apply to both people in the the group) The final submission will be tested on lab machinesLinks to an external site., so make sure to test your code on these machines. Background To understand how to make progress on any project that involves concurrency, you should understand the basics of thread creation, mutual exclusion (with locks), and signaling/waiting (with condition variables). These are described in the following book chapters: Intro to ThreadsLinks to an external site. Threads APILinks to an external site. LocksLinks to an external site. Using LocksLinks to an external site. Condition VariablesLinks to an external site. Read these chapters carefully in order to prepare yourself for this project. General Idea Let’s now get into the exact code you’ll have to build. The MapReduce library/infrastructure you will build should support the execution of user-defined Map() and Reduce() functions. As from the original paper: “Map(), written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key K and passes them to the Reduce() function.” “The Reduce() function, also written by the user, accepts an intermediate key K and a set of values for that key. It merges together these values to form a possibly smaller set of values; typically just zero or one output value is produced per Reduce() invocation. The intermediate values are supplied to the user’s reduce function via an iterator.” A classic example, written here in pseudocode (note: your implementation will have different interfaces), shows how to count the number of occurrences of each word in a set of documents: map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediateValues(w, "1"); reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseIntermediate(v); print key, result; The following image from https://dzone.com/articles/word-count-hello-word-program-in-mapreduce might help you understand the behavior of the MapReduce framework for this sample wordcount program. What’s fascinating about MapReduce is that so many different kinds of relevant computations can be mapped onto this framework. The original paper lists many examples, including word counting (as above), a distributed grep, a URL frequency access counters, a reverse web-link graph application, a term-vector per host analysis, and others. What’s also quite interesting is how easy it is to parallelize: many mappers can be running at the same time, and later, many reducers can be running at the same time. Users don’t have to worry about how to parallelize their application; rather, they just write Map() and Reduce() functions and the infrastructure does the rest. Code Overview We give you here the mapreduce.h header file that specifies exactly what you must build in your MapReduce library: #ifndef __mapreduce_h__ #define __mapreduce_h__ // Different function pointer types used by MR typedef char *(*Getter)(char *key, int partition_number); typedef unsigned long (*Partitioner)(char *key, int num_partitions); typedef void (*Mapper)(char *file_name); typedef void (*Reducer)(char *key, Getter get_func, int partition_number); unsigned long MR_DefaultHashPartition(char *key, int num_partitions); // External functions: these are what *you must implement* void MR_Emit(char *key, char *value); void MR_Run(int argc, char *argv[], Mapper map, int num_mappers, Reducer reduce, int num_reducers, Partitioner partition, int num_partitions); #endif // __mapreduce_h__ You must implement the MR_Run and MR_Emit functions. In addition, you will have to implement a get_next function that is described in the next section. MR_Run: The most important function is MR_Run, which takes the command line parameters of a given program: a pointer to a Map function (type Mapper, called map), the number of mapper threads your library should create (num_mappers), a pointer to a Reduce function (type Reducer, called reduce), the number of reducers (num_reducers), a pointer to a Partition function (partition, described below), and the number of partitions. Thus, when a user is writing a MapReduce computation with your library, they will implement a Map function, implement a Reduce function, possibly implement a Partition function, and then call MR_Run(). The infrastructure will then create threads as appropriate and run the computation. One basic assumption is that the library will create num_mappers threads (in a thread pool) that perform the map task. Another is that your library will create num_reducers threads (in a thread pool) to perform the reduction tasks. Finally, your library will create some kind of internal data structure to pass keys and values from mappers to reducers; more on this below. MR_Emit: The MR_Emit function is another key part of your library; it needs to take key/value pairs from the many different mappers and store them in a partition such that later reducers can access them, given constraints described below. Designing and implementing this data structure is thus a central challenge of the project. Simple Example: Wordcount Here is a simple (but functional) wordcount program written to use the MapReduce library: #include #include #include #include #include "mapreduce.h" void Map(char *file_name) { FILE *fp = fopen(file_name, "r"); assert(fp != NULL); char *line = NULL; size_t size = 0; while (getline(&line, &size, fp) != -1) { char *token, *dummy = line; while ((token = strsep(&dummy, " t r")) != NULL) { MR_Emit(token, "1"); } } free(line); fclose(fp); } void Reduce(char *key, Getter get_next, int partition_number) { int count = 0; char *value; while ((value = get_next(key, partition_number)) != NULL) count++; printf("%s %d ", key, count); } int main(int argc, char *argv[]) { MR_Run(argc, argv, Map, 10, Reduce, 10, MR_DefaultHashPartition, 10); } Let’s walk through this code, in order to see what it is doing and describe the requirements for this project. First, notice that Map() is called with a file name. In general, we assume that this type of computation is being run over many files; each invocation of Map() is thus handed one file name and is expected to process that file in its entirety. In this example, the code above just reads through the file, one line at a time, and uses strsep() to chop the line into tokens. Each token is then emitted using the MR_Emit() function, which takes two strings as input: a key and a value. The key here is the word itself, and the token is just a count, in this case, 1 (as a string). It then closes the file. After the mappers are finished, your library should have stored the key/value pairs(passed to MR_Emit) in partitions such that the reducers can access later. Your implementation must ensure that partition i is sent to a reducer before partition i+1. Reduce() is invoked per key, and is passed the key along with a function that enables iteration over all of the values that produced that same key. To iterate, the code just calls get_next() repeatedly until a NULL value is returned; get_next returns a pointer to the value passed in by the MR_Emit() function above, or NULL when the key’s values have been processed. The output, in the example, is just a count of how many times a given word has appeared, and is just printed to standard output. All of this computation is started off by a call to MR_Run() in the main() routine of the user program. This function is passed the argv array, and assumes that argv[1] … argv[n-1] (with argc equal to n) all contain file names that will be passed to the mappers. One interesting function that you also need to pass to MR_Run() is the partitioning function. For simplicity, all the test programs will use the default function (MR_DefaultHashPartition), which is defined below. Copy it into your file: unsigned long MR_DefaultHashPartition(char *key, int num_partitions) { unsigned long hash = 5381; int c; while ((c = *key++) != '') hash = hash * 33 + c; return hash % num_partitions; } The function’s role is to take a given key and map it to a number, from 0 to num_partitions-1. Its use is internal to the MapReduce library, but critical. Specifically, your MR library should use this function to decide which partition (and hence, which reducer thread) gets a particular key/list of values to process. Note that the number of partitions can be greater (or smaller) than the number of reducers; thus, some reducers may need to handle multiple (or no) partitions. For some applications, which reducer thread processes a particular key is not important (and thus the default function above should be passed in to MR_Run()); for others, it is, and this is why the user can pass in their own partitioning function as need be. One last requirement: For each partition, keys (and the value list associated with said keys) should be sorted in ascending key order; thus, when a particular reducer thread (and its associated partition) are working, the Reduce() function should be called on each key in order for that partition. For this sort within a partition, the sorted order should be the lexical order provided by strcmp() over the original keys. Considerations Here are a few things to consider in your implementation: Thread Management: This part is fairly straightforward. You should create num_mappers mapping threads, and assign a file to each Map() invocation in some manner you think is best (e.g., Round Robin, Shortest-File-First, etc.). Think about which algorithm might lead to the best performance? You should also create num_reducers reducer threads at some point (either at the beginning or after the mappers finish), to work on the emitted key value pairs. Partitioning and sorting: Your central data structure should be concurrent, allowing mappers to each put values into different partitions correctly and efficiently. Once the mappers have completed, a sorting phase should order the key/value-lists. Then, finally, each reducer thread should start calling the user-defined Reduce() function on the keys in sorted order per partition. You should think about what type of locking is needed throughout this process for correctness. Memory Management One last concern is memory management. The MR_Emit() function is passed a key/value pair; it is the responsibility of the MR library to make copies of each of these. However, avoid making too many copies of them since the goal is to design an efficient concurrent data structure. Then, when the entire mapping and reduction is complete, it is the responsibility of the MR library to free everything. Testing and handin instructions Tests will be provided at ~cs537-1/tests/p6. Handing it in: Copy your files to ~cs537-1/handin/login/p6 where login is your CS login. Do NOT use this handin directory for your work space. You should keep a separate copy of your project files in your own home directory and then simply copy the relevant files to this handin directory when you are done. The permissions to this handin directory will be turned off promptly when the deadline passes and you will no longer be able to modify files in that directory. Each project partner should turn in their joint code to each of their handin directories. Each person should place a file named partners.txt in their handin/p6 directory, so that we can tell who worked together on this project. The format of partners.txt should be exactly as follows: cslogin1 wiscNetid1 Lastname1 Firstname1 cslogin2 wiscNetid2 Lastname2 Firstname2 It does not matter who is 1 and who is 2. If you worked alone, your partners.txt file should have only one line. There should be no spaces within your first or last name; just use spaces to separate fields. To repeat, both project partners should turn in their code and both should have a turnin/p2a/partners.txt file.
5/5 - (1 vote) Overview: In this assignment, you’re going to implement memory mapping in xv6. Memory mapping is the process of mapping memory from a process’s virtual address space to some physical memory. You will add support for memory mapping in xv6 by implementing 2 system calls: wmap() and wunmap() which are similar to mmap() and munmap() system calls in Linux. You will implement two additional system calls, getwmapinfo() and getpgdirinfo() for debugging and testing purposes. Learning objectives: Understand how memory mapping works in modern systems. Learn about xv6 memory layout. Understand the relation between virtual and physical addresses Understand the use of the page fault handler for memory management Administrivia: This project can be performed in groups of 2. Due Date: July 29th at 11:59PM (slip day policy will apply to both people in the the group) The final submission will be tested on lab machinesLinks to an external site., so make sure to test your code on these machines. Details 1) Background Memory mapping or mmap is a technique used in operating systems to create new mappings in the virtual address space of the calling process. It has two modes of operation: anonymous and file-backed. Anonymous memory mapping is used to allocate memory without file backing, similar to malloc() and free(), but is more powerful. In fact, in most systems today, malloc() and free() invoke memory mapping system calls such as mmap() and munmap(). File-backed memory mapping is commonly used to map a file or a portion of a file into memory. This is an efficient way to access large files. In this assignment, you will implement support for anomyous memory mapping in xv6. User processes can request the OS to create/delete memory mappings in its address space using system calls wmap() and wunmap(). Anonymous memory mappings created using this approach offers more control to the users than malloc() through the use of flags. Following are some details about this. In malloc, when you allocate some memory, it returns the virtual address of the allocated memory. However, through mmap, you can specify the virtual address where the memory should be, using MAP_FIXED flag. It supports demand paging. It means, when you need to allocate n bytes of memory, a virtual memory region of n bytes is reserved, but the actual physical memory pages are not allocated until they are accessed. Later, when the process tries to access that memory for the first time, a page fault occurs. The OS then allocates a physical page of memory and maps it to the corresponding virtual address. This process allows efficient use of memory by allocating pages only as they are needed, thereby reducing the initial memory footprint and avoiding unnecessary memory allocation. In this assignment, demand paging should be enabled by default and can be disabled using MAP_POPULATE flag. When a process with a memory-mapping creates child processes, those child processes have the same mappings as the parent process. If there are any data in those mappings at the time of fork, the child should be able to read the same data after the fork through the mappings. Using the MAP_SHARED flag, these memory regions can be shared, which means parent and child processes all have the same virtual mappings, and those mappings maps the same physical addresses. This allows processes to communicate and share data seamlessly, as any changes made to the memory are visible to all processes. Alternatively, with the MAP_PRIVATE flag, parent and child will still have the same virtual mapping, but each process gets it own physical copy of the memory. At the time of the fork, the parent and child process will see the same data in the mappings, but later they can modify those data independently, so changes are not visible to each other. In original mmap(), an optimization strategy called copy-on-write (CoW) is used wherein the actual copying of physical pages is deferred until a write operation occurs. In this assignment, you don’t need to implement CoW. 2) Setup Define a macro MAPBASE inside memlayout.h file. #define MAPBASE 0x60000000 // First mmap virtual address Notice that memlayout.h already has another macro KERNBASE(0x80000000), that is used to separate the user space from the kernel space. The lower part of the virtual address space (0x00000000 to KERNBASE - 1) is used for user space (where user code, data, heap and stack reside), and the upper part (KERNBASE to 0xFFFFFFFF) is used for kernel space (where the OS code and data reside). We will use a predefined starting address (MAPBASE) for mmap regions. This helps to organize and separate the mmap regions from other parts of the process’s memory. By placing MAPBASE near the top of the user address space, it allows the rest of the user space (from 0x00000000 to 0x5FFFFFFF) to be used for other allocations such as stack, heap, and code segments. Here’s a simplified view of how the address space will look like (Figure 2-2 in the xv6 book will give a detailed view): +--------------------------+ 0xFFFFFFFF (Top of virtual address space) | Kernel Space | | | | (Kernel code, | | data, etc.) | +--------------------------+ 0x80000000 (KERNBASE) | User Space | | | | (Memory mapped | | regions start here) | +--------------------------+ 0x60000000 (MAPBASE) | | | (User heap, code, | | stack, etc.) | | | +--------------------------+ 0x00000000 (Start of user address space) allocuvm() grows process from oldsz to newsz. Do not let a process grow beyond MAPBASE. Create a file named wmap.h with the following contents inside your xv6 folder. // Flags for wmap #define MAP_PRIVATE 0x0001 #define MAP_SHARED 0x0002 #define MAP_POPULATE 0x0004 #define MAP_FIXED 0x0008 // for `getpgdirinfo` #define MAX_UPAGE_INFO 32 struct pgdirinfo { uint n_upages; // the number of allocated physical pages in the process's user address space uint va[MAX_UPAGE_INFO]; // the virtual addresses of the allocated physical pages in the process's user address space uint pa[MAX_UPAGE_INFO]; // the physical addresses of the allocated physical pages in the process's user address space }; // for `getwmapinfo` #define MAX_WMMAP_INFO 16 struct wmapinfo { int total_mmaps; // Total number of wmap regions int addr[MAX_WMMAP_INFO]; // Starting address of mapping int length[MAX_WMMAP_INFO]; // Size of mapping int n_loaded_pages[MAX_WMMAP_INFO]; // Number of pages physically loaded into memory }; 3) System Calls You have to implement four system calls for this assignment. 3.1. wmap int wmap(uint addr, int length, int flags) wmap is similar to mmap(). It creates a new mapping in the virtual address space of the calling process. Creating memory mappings involve 3 steps: 1) allocating region in the virtual address space, 2) allocating physical memory, and 3) creating mappings (page tables) between the allocated virtual address region and the allocated physical pages. wmap allocates length bytes of memory and returns the virtual address of the allocated memory. The returned address should be page-aligned. You must use virtual addresses only between MAPBASE and KERNBASE-1. If the provided length is not page-aligned, the length should be rounded off to next multiple of pagesize (4096 bytes). Your implementation of wmap should have demand paging enabled by default, which means that physical memory should not be allocated during the system call. Instead, physical memory should be allocated during page faults. Also, when forking, the child process should have the same mappings as the parent process. You should return -1 for any type of error (e.g., the provided addr or length is not valid). length: The length of the mapping in bytes. It must be greater than 0. addr: The virtual address that wmap must use for the mapping, if MAP_FIXED flag is on. Otherwise, ignore this value. This address should be page-aligned and within MAPBASE and KERNBASE-1 flags: enables or disables some features of wmap. Flags can be ORed together (e.g., MAP_PRIVATE | MAP_FIXED | MAP_POPULATE). These flags should be defined as constants in wmap.h header file. You have to implement four flags as explained in the following: MAP_FIXED: If this is set, then the mapping must be placed at exactly addr. If this flag is not used, you can ignore addr. Also, a valid addr is a multiple of page size and within MAPBASE (0x60000000) and KERNBASE-1 (0x80000000-0x1). MAP_POPULATE: Demand paging is enabled by default. If this flag is set, disable demand paging, which means that the physical memory will be allocated immediately during the system call. MAP_SHARED: This flag tells wmap that the mapping is shared. Memory mappings are copied from the parent to the child across the fork system call. If the mapping is MAP_SHARED, then changes made by the child will be visible to the parent and vice versa. MAP_PRIVATE: If this flag is set, then the mapping is not shared. You still need to copy the mappings from parent to child, but these mappings should use different physical pages. In other words, the same virtual addresses are mapped in child, but to a different set of physical pages. Note that between the flags MAP_SHARED and MAP_PRIVATE, one of them must be specified in flags. These two flags cannot be used together. Enabling demand paging means that you don’t actually allocate any physical pages when wmap is called. You just track the allocated virtual address region using some structure. Later, when the process tries to access that memory, a page fault is generated which is handled by the kernel. In the kernel, you can now allocate a physical page and let the user resume execution. To set up the page fault handler, you’ll add something like this to the trap() function in trap.c: case T_PGFLT: // T_PGFLT = 14 if page fault addr is part of a mapping: // demand paging // handle it else: cprintf("Segmentation Fault "); // kill the process 3.2. wunmap int wunmap(uint addr); wunmap removes the memory mapped region that starts at addr in the process virtual address space. addr must be page-aligned and the start address of some existing wmap. It returns 0 if the memory map is removed successfully, otherwise -1 (e.g.: addr is not the start of any memory mapped region). While removing a memory map, if it is MAP_SHARED and it was inherited from the parent, then be careful not to free the physical pages, because the parent process might still be using it. Note that when a child process calls exit(), the wait() called by the parent process frees the memory (from address 0x0 to KERNBASE) used by the child process by calling freevm(p->pgdir), which will free the shared mmap physical pages. You need to modify the code a little bit to avoid this. 3.3. getpgdirinfo int getpgdirinfo(struct pgdirinfo *pdinfo); getpgdirinfo retrieves information about the process address space by populating struct pgdirinfo. It returns 0 in case of successful retrieval of information, otherwise -1. This system call should calculate how many physical pages are currently allocated to the current process and store it in n_upages. It should also populate va[MAX_UPAGE_INFO] and pa[MAX_UPAGE_INFO] with the first 32 virtual page addresses and corresponding physical addresses, ordered by the virtual addresses. (Hint: You should only gather information (either for calculating n_upages or returning va/pa pairs) on pages with PTE_U set (i.e. user pages). The only way to do that is to directly consult the page table for the process.) 3.4. getwmapinfo int getwmapinfo(struct wmapinfo *wminfo); getwmapinfo retrieves information about the process address space by populating struct wmapinfo. This system call should calculate the current number of memory mapped regions in the process’s address space and store the result in total_mmaps. It should also populate addr[MAX_WMMAP_INFO] and length[MAX_WMAP_INFO] with the address and length of each memory map. as mentioned earlier, If an mmap’s provided length is not page-aligned, the length should be rounded off to next multiple of pagesize (4096 bytes). You can assume that the number of memory maps for any process will not exceed MAX_UPAGE_INFO. The n_loaded_pages[MAX_WMAP_INFO] should store how many pages have been physically allocated for each wmap. This field should reflect demand paging. For example, the length of a memory mapped region could be 8192 but the number of physical pages allocated for this region (n_loaded_pages) might be 0 because of demand paging. 3.5. Some simplifying assumptions You can make the following simplifying assumptions: All mapped memory is readable/writable. The maximum number of memory maps is 16. The parent process will always exit after the child process. Because of demand paging, there might be some pages that are not yet physically allocated when calling fork(). We’ll make sure to use MAP_POPULATE flag for all mmaps before calling fork(), so you need not worry about this case. 4) A simple example of wmap and wunmap Here, we walk you through a simple end-to-end example of a call to wmap. You may want to take a look at the helper functions referenced in this example – you’ll most likely need to use most of them in your implementation. Suppose that we make the following call to `wmap`: uint address = wmap(0x60000000, 8192, MAP_FIXED | MAP_SHARED | MAP_POPULATE); This is the simplest combination of flags – we’re requesting a shared (MAP_SHARED) mapping having a length of two pages and starting at virtual address 0x60000000 (MAP_FIXED). In xv6, 1 page is 4096 bytes and 4096=0x1000. Let’s assume no memory mappings exist at the virtual address range 0x60000000 to 0x60002000 (8192 = 0x2000). So we can use create the requested memory mapping at virtual address 0x60000000 extending until 0x60002000. You should keep track of these mappings per process so you know what ranges of virtual addresses are free. You can create a struct that stores information relevant to a memory map, and store an array/list of such structs in the Process Control Block (struct proc). You might want to keep this array sorted by virtual address, which will make it easy to verify whether another memory mappings already exists in an address range or not. So far we only talked about virtual address – there also needs to be some corresponding physical pages that are allocated for this mapped region. Since we use the MAP_POPULATE flag, physical memory needs to be allocated immediately. Physical addresses are managed at the granularity of a page, which is 4096 bytes long in xv6. You’ll just need to call existing functions (i.e., kalloc(), kfree() in kalloc.c) in the kernel to get free physical addresses. kalloc() and kfree() work at the granularity of a page (4096 bytes). So if you need 8192 bytes of physical memory, you need to call kalloc() twice, each time it will give you the starting address of a physical page that is 4096 bytes long. Needless to say that the physical addresses returned by the kernel are not necessarily contiguous in the physical address space. So in our implementation, we would have to call char *mem = kalloc(); where mem is the starting physical address of a free page we can use. Note that we haven’t created mapping (page tables) between virtual addresses and physical adresses. We have just allocated virtual and physical regions/pages. To create such mappings, we need to create and place entries in the page table. The mappages function does exactly that. In this case, the call to mappages would be something like the following: mappages(page_directory, 0x60000000, 4096, V2P(mem), PTE_W | PTE_U); The first argument is the page table of the process to place the mapping for. Each process has a page table as defined in struct proc in proc.h. The last argument is the protection bits for this page. PTE_U means the page is user-accessible and PTE_W means it’s writable. There’s no flag for readable, because pages are always readable at least. For this project, you can always pass PTE_W|PTE_U as the last argument. You should always apply the V2P (converts a virtual to physical address in the kernel) macro to the address that you get from kalloc. This is because of the way xv6 manages physical memory. The address returned by kalloc is not an actual physical address, it’s the virtual address that the kernel uses to access each physical page. Yes, all physical pages are also mapped in the kernel. To get more details on this, refer to the second chapter in the xv6 manual on page tables. The above mappages() call will create page table mappings between virtual address 0x60000000 and the physical page returned by kalloc(). So, after the call to mappages, did we successfully map 8192 bytes of memory at 0x60000000? No! it maps only a single page. In xv6, we have to map one page at a time, each time requesting a new physical page from kalloc. So we may complete our mapping by doing a second call: mem = kalloc(); mappages(page_directory, 0x60001000, 4096, V2P(mem), PTE_W | PTE_U); Notice how we advanced the virtual address by one page (0x60001000 = 0x60000000 + 0x1000). Now you can quickly build up on this example and create mappings in a loop, getting as many pages as necessary from kalloc. Now let’s see how to remove a mapping. Suppose the user accesses the pages we allocated at 0x60000000 for a while, and does a call to `wunmap`: wunmap(0x60000000); First we need to find any metadata that we maintain in the Process Control Block for the mmap starting at 0x60000000. Before removing the metadata of the mmap, we need to keep a note of the length of the mapped region. This will help us determine how many how many page table entries need to be deleted and how many physical pages need to be freed. Next, the page table must be modified so that the user can no longer access those pages. walkpgdir function can be used for that purpose. A typical call to walkpgdir may look like this: pte_t *pte = walkpgdir(page_directory, 0x60000000, 0); It returns the page table entry that maps 0x60000000. We’ll eventually need to do *pte = 0, which will cause any future reference to that virtual address to fail. Before that, we need to free the physical page we received from kalloc. Each pte stores a set of flags (e.g. R/W) and a physical page address, called Page Frame Number or PFN for short. Using a simple mask operation, you can extract the physical address from the pte. Look at the macros defined in mmu.h. The final piece of the code will look like this: physical_address = PTE_ADDR(*pte); kfree(P2V(physical_address)); *pte = 0; We need to apply P2V (converts a physical to virtual address in the kernel) because kfree (and kalloc, as explained before) only work with kernel virtual addresses. Only one page has been freed so far. You’ll need to do the exact same calls, but this time passing a virtual address of 0x60001000 to free the second page: pte_t *pte = walkpgdir(page_directory, 0x60001000, 0); physical_address = PTE_ADDR(*pte); kfree(P2V(physical_address)); *pte = 0; Memory mappings should be inherited by the child. To do this, you’ll need to modify the fork() system call or the internal logic of copyuvm() to copy the mappings from the parent process to the child across fork(). Hints Like prior projects, start by making small additions to xv6. Make sure you provide the support for a basic allocation, then add more complex functionality. At each step, make sure you have not broken your code that was previously working — if so, stop and take time to see how the introduced changes caused the problem. The best way to achieve this is to use a version control system like git. This way, you can later refer to previous versions of your code if you break something. Following these steps should help you with your implementation: First of all, make sure you understand how xv6 does memory management. The second chapter of xv6 book gives a good insight of the memory layout in xv6. Furthermore, it references some related source files. Looking at those sources should help you understand how mapping happens. You’ll need to use those routines while implementing wmap. You will appreciate the time you spent on this step later. Try to implement a basic wmap that supports MAP_FIXED|MAP_SHARED|MAP_POPULATE. It should just check if that particular region asked by the user is available or not. If it is, you should map the pages in that range. Implement wunmap. For now, just remove the mappings. Implement getwmapinfo and getpgdirinfo. Most of the tests depend on these two system calls to work. Implement demand paging. Modify your wmap such that it’s able to search for an available region in the process address space. This should make your wmap work without MAP_FIXED. Copy mappings from parent to child across the fork() system call. Implement MAP_PRIVATE. You’ll need to change fork() to behave differently if the mapping is private. Testing and handin instructions Test cases will be released later. The handin directory will be ~cs537-1/handin/LOGIN/p5where LOGIN is your CS login. Similar to P2 and P4, please create a subdirectory called ‘src’: ~cs537-1/handin/LOGIN/p5/src. Copy all of your xv6 source files (but not .o files, please, or binaries!) into this directory. Each project partner should turn in their joint code to each of their handin directories. Each person should place a file named partners.txt in their handin/p4 directory, so that we can tell who worked together on this project. The format of partners.txt should be exactly as follows: cslogin1 wiscNetid1 Lastname1 Firstname1 cslogin2 wiscNetid2 Lastname2 Firstname2 It does not matter who is 1 and who is 2. If you worked alone, your partners.txt file should have only one line. There should be no spaces within your first or last name; just use spaces to separate fields. To repeat, both project partners should turn in their code and both should have a turnin/p2a/partners.txt file.
5/5 - (1 vote) Overview: In this assignment, you will be implementing a compensating-lottery scheduler and supporting system calls in xv6. Along with that, you will be updating the process sleep and wakeup mechanisms. Unlike prior projects, this project can be done in pairs, so find yourself a partner you would like to work out this assignment with. Learning objectives: Understand how a lottery scheduler works. Learn how to add new features (such as a new scheduler) to existing code bases. Gain experience working with collaborators. Expect to spend a lot of time communicating with each other and working together. Administrivia: This project can be performed in groups of 2. Due Date: July 15th, at 11:59PM (slip day policy will apply to both people in the the group) The final submission will be tests on lab machinesLinks to an external site., so make sure to test your code on these machines. Details: 1) Background The current xv6 scheduler implements a very simple Round Robin (RR) policy. For example, if there are three processes A, B and C, then the xv6 round-robin scheduler will run the jobs in the order A B C A B C … , where each letter represents a process. The xv6 scheduler runs each process for at most one timer tick (10 ms); after each timer tick, the scheduler moves the previous job to the end of the ready queue and dispatches the next job in the list. The xv6 scheduler does not do anything special when a process sleeps or blocks (and is unable to be scheduled); the blocked job is simply skipped until it is ready and it is again its turn in the RR order. Since RR treats all jobs equally, it is simple and fairly effective. However, there are instances where this “ideal” property of RR is detrimental: if compute-intensive background jobs (anti-virus checks) are given equal priority to latency-sensitive operations, operations like music streaming (latency-sensitive) suffer. Lottery SchedulingLinks to an external site. is a proportional share policy which fixes this by asking users to grant tickets to processes. The scheduler holds a lottery every tick to determine the next process. Since important processes are given more tickets, they end up running more often on the CPU over a period of time. 2) Scheduler details 1.1) Basic lottery scheduler In your scheduler, each process runs for an entire tick until interrupted by the xv6 timer. At each tick, the scheduler holds a lottery between RUNNABLE processes and schedules the winner. When a new process arrives, it should have the same number of tickets as its parent. The first process of xv6 should start with 1 ticket. You will also create a system call settickets(int pid, int tickets), to change the number of tickets held by the specified process. As an example, if there is a process A with 1 ticket and process B with 4 tickets, this is what the timeline might look like: timer tick 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 scheduled process B B A B B B B B B A A B B B A B B B B B In this case, A was scheduled for 4 ticks, and B for 16. Given sufficiently large number of ticks, we can expect the ratio of CPU time per process to be approximately the ratio of their tickets. Scheduling is relatively easy when processes are just using the CPU; scheduling gets more interesting when jobs are arriving and exiting, or performing I/O. There are three events in xv6 that may cause a new process to be scheduled: whenever the xv6 10 ms timer tick occurs when a process exits when a process sleeps. For simplicity, your scheduler should also not trigger a new scheduling event when a new process arrives or wakes; you should simply mark the process as “RUNNABLE”. The next scheduling decision will be made the next time a timer tick occurs. 1.2) Compensating lottery scheduler Like we discussed in MLFQ, many schedulers contain some type of incentive for processes that sleep (or block) and voluntarily relinquish the CPU so another process can run; this improves overall system throughput since the process isn’t wastefully holding on to the CPU doing nothing. We will add such an incentive to the lottery scheduler. Your scheduler will track the number of ticks a process was sleeping/blocked. Once it’s runnable, it will boost (double) the number of tickets for the same number of ticks. For instance, if a process slept for 5 ticks, its tickets would be artificially doubled for the next 5 lotteries it participates in. Example: a process A has 1 ticket and B has 4 tickets. Process A sleeps for 2 ticks, and is therefore given double tickets for the next 2 lotteries after it wakes. This allows the scheduler to keep the CPU usage ratio equal to the ratio of tickets, even when process A is blocked. The timeline could look like the following in this case (description below the timeline): timer tick 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 scheduled B A B B B A A B B B B B B A B B B B B B sleeping/blocked A A A A A lottery tickets 1 1 – – 2 2 1 1 1 1 1 1 1 1 – – 2 2 1 1 B lottery tickets 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Note the following: At each tick, we consider the tickets a runnable process has. Ticks 3, 4, 15 and 16 only have 1 runnable process, thus only B participates in the lottery. Once A is no longer blocked, it starts participating in lotteries. If it slept for x ticks, it will participate in the next x lotteries with double the tickets. Make sure that when a process sleeps, it is given compensation ticks for the amount of time it was actually sleeping, not the amount of time it wasn’t scheduled. For example, process A becomes RUNNABLE at tick 17. You should boost its tickets only for the next 2 lotteries, regardless of whether A actually wins the lottery or not. 1.3) Carrying forward remaining boost Assume A sleeps for 3 ticks, thus its tickets are boosted for the next 3 ticks. What happens if it sleeps for 3 ticks again while it still has 3 remaining boosts left? In this case, its tickets should be boosted for the next (3+3remaining) rounds after it wakes. This is important for maintaining the lottery scheduler’s proportional-CPU-share property. Given that A couldn’t even participate in the lottery when it was blocked, we need to ensure it has a higher chance of winning in the corresponding number of lotteries. This is a potential timeline for the described situation. Realize that after tick 5, A was supposed to be boosted for 3 ticks. Since it got blocked before that, we ensure that it doesn’t lose the number of favourable/boosted rounds once it wakes. interval 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 scheduled B A B B B A B A A B B B B B A B B sleeping A A A A A A A tickets 1 1 – – – 2 – – – 2 2 2 2 2 1 1 1 1 1 1 B tickets 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3) Enhancing sleep and wakeup mechanism in xv6 Finally, you need to change the existing implementation of sleep() syscall. The sleep syscall allows processes to sleep for a specified number of ticks. Unfortunately, the current xv6 implementation of the sleep() syscall forces the sleeping process to wake on every timer tick to check if it has slept for the requested number of timer ticks or not. This has two drawbacks. First, it is inefficient to schedule every sleeping process and force it to perform this check. Second, and more importantly for this project, if the process was scheduled on a timer tick (even just to perform this check), it will lose at least 1 favourable round since it ended up participating in the lottery. You are required to fix this problem of extra wakeups by changing wakeup1() in proc.c. You are likely to add additional condition checking to avoid falsely waking up the sleeping process until it is the right time. You may want to add more fields to struct proc to help wakeup1() decide whether if it is time to wake a process. You might also use this mechanism to count how many ticks a process was sleeping for so it can be boosted accordingly. You may find the section “sleep and wakeup” in the xv6 book (Links to an external site.) (starting from page 65) helpful. 4) Some implementation details In order to hold a lottery, we first need a way to generate a random number. Here’s the code to do so: Copy the following at the top of proc.c after the headers: #define RAND_MAX 0x7fffffff uint rseed = 0; // https://rosettacode.org/wiki/Linear_congruential_generator uint rand() { return rseed = (rseed * 1103515245 + 12345) & RAND_MAX; } 5) New system calls You’ll need to create 3 new system calls for this project. int settickets(int pid, int n_tickets). This syscall is used to set the number of tickets allotted to a process. It returns 0 if the pid value is valid and n_tickets is positive. Else, it returns -1. void srand(uint seed): This sets the rseed variable you defined in proc.c. (Hint: in order to effectively mutate that variable in your sys_srand function, you will need to define extern uint rseed. Check https://stackoverflow.com/a/1433226Links to an external site.). int getpinfo(struct pstat *). Because your scheduler is all in the kernel, you need to extract useful information for each process by creating this system call so as to better test whether your implementation works as expected. To be more specific, this system call returns 0 on success and -1 on failure (e.g., the argument is not a valid pointer). If successful, some basic information about each running process will be returned: struct pstat { int inuse[NPROC]; // whether this slot of the process table is in use (1 or 0) int pid[NPROC]; // PID of each process int tickets[NPROC]; // how many tickets does this process have? int runticks[NPROC]; // total number of timer ticks this process has been scheduled int boostsleft[NPROC]; // how many more ticks will this process be boosted? }; You can decide if you want to update your pstat statistics whenever a change occurs, or if you have an equivalent copy of these statistics in ptable and want to copy out information from the ptable when getpinfo() is called. The file should be copied from ~/cs537-1/public/p4_files/pstat.h. Do not change the names of the fields in pstat.h. You may want to add a header guard to pstat.h and param.h to avoid redefinition. 6) User-level applications for testing To demonstrate that your scheduler is doing at least some of the right things, we provide 2 such applications: loop and schedtest. The code for these programs is available in ~/cs537-1/public/p4_files. Copy these into your src folder and have xv6 compile these as user programs by modifying the UPROGS variable in the Makefile. loop is a dummy job that just does some work. It takes exactly 1 argument sleepT. It sleeps for sleepT ticks, then runs a huge loop. schedtest runs two copies of loop as child processes, controlling the number of tickets and sleep time of each child. The schedtest application takes exactly 5 arguments in the following order: prompt> schedtest ticketsA sleepA ticketsB sleepB sleepParent schedtest spawns two children processes, each running the loop application. One child A is given ticketsA tickets and execs loop sleepA; the other child B is given ticketsB tickets and it runs loop sleepB. Specifically, the parent process calls fork(), settickets() and exec() for the two children loop processes, A before B. The parent schedtest process then sleeps for sleepParent ticks by calling sleep(sleepParent) After sleeping, the parent calls getpinfo(), and prints one line of two numbers separated by a space: printf(1, "%d %d: %d ", runticksA, runticksB, runticksA / runticksB), where runticksA is the runticks of process A in the pstat structure and similarly for B. The parent then waits for the two loop processes by calling wait() twice, and exits. You will be able to run schedtest to test the basic compensation behaviour of your scheduler. For example: prompt> schedtest 10 0 2 0 120 99 19 5 # Since it's randomized, it's difficult to completely predict the output of your program. # However, the key thing to note is that the ratio of runticksA/runticksB is close to the ratio of tickets 10/2. # You should play around with the number of tickets you grant each process and see the CPU-ratio match. # As long as one of the child hasn't terminated yet, increasing the sleepParent value should lead to the CPU-use ratio converging to 5. # You can also modify the code for loop so that it runs longer if needed. Thus, you might need to write your own userlevel programs to validate your scheduler. As an example, in order to test the boosted algorithm, you can copy the loop program into another loop-print-info program. loop-print-info can print boostsleft using getpinfo() right after it finishes sleeping. Make sure you sleep for >= 10 ticks so that your process has a reasonable chance of getting scheduled during its boosted phase. Thus you can the expect value of boostsleft to be between 0 and 10. You could also print boostsleft just before the loop-print-info program terminates to ensure that boostsleft has counted down successfully. We will not test your implementation of any other userspace programs: their purpose is to primarily help you test your scheduler. You are welcome to write more user applications and play with different values to test other aspects of the scheduler. 7) Other details You should run xv6 on only a single CPU. Make sure in your Makefile CPUS := 1. During a boost, you should not double the number of tickets you return in getpinfo. Instead, return the original number of tickets. Child processes will inherit the same number of tickets of the parent process irrespective of the number of boostsleft of the process. Boosts don’t carry over to the child process. All blocked processes should be boosted. In xv6, a blocked process is in the SLEEPING state. A process could be in SLEEPING state even if it didn’t use the sleep syscall; such processes should be boosted as well. Remember that different processes could be blocked for varying number of ticks. Hints Please refer to this documentLinks to an external site. for hints and suggestions regarding this project. Testing and Handing in Test cases have been released and can be accessed at ~cs537-1/tests/p4. The handin directory will be ~cs537-1/handin/LOGIN/p4 where LOGIN is your CS login. Similar to P2, please create a subdirectory called ‘src’: ~cs537-1/handin/LOGIN/p4/src. Copy all of your xv6 source files (but not .o files, please, or binaries!) into this directory. Each project partner should turn in their joint code to each of their handin directories. Each person should place a file named partners.txt in their handin/p4 directory, so that we can tell who worked together on this project. The format of partners.txt should be exactly as follows: cslogin1 wiscNetid1 Lastname1 Firstname1 cslogin2 wiscNetid2 Lastname2 Firstname2 It does not matter who is 1 and who is 2. If you worked alone, your partners.txt file should have only one line. There should be no spaces within your first or last name; just use spaces to separate fields. To repeat, both project partners should turn in their code and both should have a turnin/p2a/partners.txt file.
5/5 - (1 vote) Overview: In this project, you will implement a simple UNIX shell similar to bash and sh. The shells you implement will be similar to, but much simpler, than the one you run every day in Unix. You can find out which shell you are running by typing echo $SHELL at a prompt. You may then wish to look at the man pages for sh or the shell you are running to learn more about all of the functionality that can be present. For this project, you do not need to implement as much functionality as is in most shells. Your shell will implement the following features: Different modes of execution: Interactive mode and batch mode of execution. Aliases: Aliases are mainly used for abbreviating commands, or for adding default arguments to a regularly used command. Output redirection: When specified, the output should be written to a file instead of stdout. Environment variables: Environment variables are special shell variables that are used by applications executed by the shell. Last, your shell will implement some built-in commands such as alias, export, unset to support the above features. Learning objectives: Gain more familiarity with programming in C and in Linux Learn how processes are created, managed and terminated Gain exposure to some of the functionality in shells Deliverables: A .c file named wisc-shell.c The file should compile successfully when compiled with the -Wall and -Werror flags. The shell should pass tests we supply as well as tests that meet our specification that we do not supply Include a README.md describing your implementation Administrivia: This project is to be performed alone. Due Date: June 24th, at 11:59pm (4 slip days throughout course, use wisely) Similar to P1, a small portion of the credit is allocated for good programming style and memory management. Read the detailsLinks to an external site.. This project is to be done on the lab machinesLinks to an external site., so you can learn more about programming in C on a typical UNIX-based platform (Linux). Details: In this assignment, you will implement a UNIX shell called wish (WIsconsin SHell). The shell should operate in this basic way: when you type in a command (in response to its prompt), the shell creates a child process that executes the command you entered. After the child process finishes, the shell then prompts for more user input. More specifically, shells are typically implemented as a simple loop that waits for input and fork()s a new child process to execute the command; the child process then exec()s the specified command while the parent process wait()s for the child to finish before continuing with the next iteration of the loop. 1. Different modes of execution Your shell can be run in two modes: interactive and batch. The mode is determined when your shell is started. If your shell is started with no arguments (i.e., ./wish) , it will run in interactive mode. In interactive mode, you will display a prompt (the string wish> , note the space AFTER the > character, to stdout) and the user will type in a command at the prompt. If your shell is given the name of a file (e.g., ./wish batch-file), it runs in batch mode. In batch mode, your shell is started by specifying a batch file on its command line; the batch file contains the list of commands (each on its own line) that should be executed. In batch mode, you should not display a prompt. In batch mode, you should echo each line you read from the batch file back to the user (stdout) before executing it; this will help you when you debug your shells (and us when we test your shell — most of which will be of batch mode). If the line is empty or only composed of whitespace, you should still echo it. In both interactive and batch mode, your shell terminates when it sees the exit command on a line or reaches the end of the input batch file. Assume that the max length of any command including its arguments is 512 bytes. If any of the input commands is invalid, print Error: command not found to stderr but continue processing next set of commands. If there was any error while opening the batch file, print Error: could not open batchfile to stderr. 2. Aliases Most shells contain functionality for aliases. To see the aliases that are currently active in your Linux shell, you can type alias. Basically, an alias is a short-cut so that users can type in something simple and have something more complex (or more safe) be executed. For example, you could set up: wish> alias ll /bin/ls -al so that within this shell session, the user can simply type ll and ls -al will be run. There are three ways that alias command can be invoked in your shell. Creating new aliases: To create new aliases, the user types the word alias, followed by a single word (the name of the alias), followed by a replacement string(s). you should set up an alias between the alias-name and the value (e.g. alias ll ls -la). If the alias-name was already being used, just replace the old value with the new value. Displaying existing aliases: If the user types a single word alias (no other following words), your shell should print all the aliases that have been set up so far with one per line. The format is alias name=replacement string. Print the aliases in the order they were created. Example: wish> alias ll ls -al wish> alias grep grep --color=auto wish> alias # Print all aliases ll=ls -al grep=grep --color=auto Displaying specific alias: If the user types alias followed by a word, if the word matches a current alias-name, print the alias-name, followed by ‘=’ and corresponding replacement string within single quotes (‘ ‘); if the word does not match a current alias-name, print Error: alias not found and continue. Example: wish> alias ll # Display ll alias ll='ls -al' To use an alias after it is created, the user types the alias just as they would type any other command: wish> alias ll /bin/ls -l # Create the alias wish> ll # Run the alias command You should be able to handle an arbitrary number of aliases. You do not need to worry about aliases to other aliases, aliases that involve redirection, or redirection of aliases. Running an alias with additional arguments (e.g. ll -a where ll is an alias-name) is undefined behavior. We will not test this. All alias calls will consist of only the alias-name. If an alias with an invalid command is executed, then print Error: command not found . 3. Environment variables Environment variables are user-definable values that are used by applications. Environment variables are part of the environment in which a process runs. In this case, the environment is that of your shell. In Linux, every process has its set of environment variables. These variables are stored in the environ extern variable. Some important things about environment are the following: When fork is called, the child process gets a copy of the environ variable. When a system call from the exec family of calls is used, the new process is either given the environ variable as its environment or a user specified environment depending on the exact system call used. See man 3 exec. There are functions such as getenv and setenv that allow you to view and change the environment of the current process. See the man environ for more details. Your shell will support the setting, unsetting and displaying environment variables through built-in commands: Setting environment variables: To add environment variables, users will use the export command. Example: export MYENVVARNAME=somevalue. After this command is executed, the MYENVVARNAME variable will be present in the environment of any child processes spawned by the shell. Unsetting environment variables: Environment variables can be removed using the unset command. Example: unset MYENVVARNAME. After this command is executed, MYENVVARNAME variable is removed from the environment from all future child processes spawned by the shell. Variable substitution: Whenever the $ sign is used at the start of a token, it is always followed by a variable name. Variable values should be directly substituted for their names when the shell interprets the command. Tokens in our shell are always separated by whitespace, and variable names and values are guaranteed to each be a single token. For example, given the command mv $ab $cd,, you would need to replace variables ab and cd. Please note that, for a token to be considered a variable name, the token must start with the $ sign which means in a command like cd ab$ef, the token ab$ef is a normal token, it is not a variable name, our shell must not attempt to substitute $ef. Displaying environment variables: The env utility program (not a shell built-in) can be used to print the environment variables. You don’t have to implement any command to print environment variables. You can assume the following about environment variable handling for your shell: There will be at most one variable assignment per line. Lines containing variable assignments will not include any other commands or arguments. The entire value of the variable will be present on the same line, following the = operator. There will not be multi-line values; you do not need to worry about quotation marks surrounding the value. Variable names and values will not contain spaces or = characters. There is no limit on the number of variables you should be able to assign. There won’t be recursive variable definitions of the form local anothervar=$previousvar. If a non-existent environment variable name is passed to unset, print unset: environment variable not present . unset can be called with multiple arguments. For example, unset ENVVAR1 ENVVAR2. Your shell should unset all the environment variables passed to unset. If one or more of the variables does not exist, unset the rest and print unset: environment variable not present . 4. Output redirection Many times, a shell user prefers to send the output of a command to a file rather than to the screen. Usually, a shell redirects standout output to a file with the > character; your shell should include this feature. For example, if a user types /bin/ls -la > ls.out into your shell, nothing should be printed on the screen. Instead, the standard output of the ls program should be rerouted to the file ls.out. Note that stderr output should not be changed; it should continue to print to the screen. If the output file exists before you run your program, you should simply overwrite it (after truncating it, which sets the file’s size to zero bytes). The exact format of redirection is: a command (along with its arguments, if present), followed by any number of white spaces, the redirection symbol >, again any number of white space, followed by a filename. Multiple redirection operators (e.g. /bin/ls > > file.txt ), starting with a redirection sign (e.g. > file.txt ), multiple files to the right of the redirection sign (e.g. /bin/ls > file1.txt file2.txt ), or not specifying an output file (e.g. /bin/ls > ) are all errors. Print Redirection error to stderr. If the output file cannot be opened for some reason (e.g., the user doesn’t have write permission or the name is an existing directory), your shell should print exactly Redirection error . In these cases, do not execute the command and continue to the next line. Do not worry about redirection for built-in commands (alias, unalias, export, unset and exit); we will not test these cases. Built-in commands: A built-in command means that the shell interprets this command directly; the shell does not exec() the built-in command and run it as a separate process; instead, the built-in command impacts how the shell itself runs. You need to write code that handles these commands. You will be implementing the following commands which have been described above: alias: Used to create new aliases or display existing aliases. export: Used to assign environment variables. unset: Used to delete environment variables. exit: Used to end the shell. Hints This project is not as hard as it may seem at first reading; in fact, the code you write will be much, much smaller than this specification. Writing your shell in a simple manner is a matter of finding the relevant library routines and calling them properly. Your shell is basically a loop: it repeatedly prints a prompt (if in interactive mode), parses the input, executes the command specified on that line of input, and waits for the command to finish, if it is in the foreground. This is repeated until the user types “exit” or ends their input. You should structure your shell such that it creates a new process for each new command. There are two advantages of creating a new process. First, it protects the main shell process from any errors that occur in the new command. Second, it allows easy concurrency; that is, multiple commands can be started and allowed to execute simultaneously. Below, we suggest some routines that you could use in your implementation: Parsing input commands: For reading lines of input, you may want to look at fgets(). You may find the strtok() routine useful for parsing the command line (i.e., for extracting the arguments within a command separated by spaces). A warning about strtok(): It modifies the input string, and the char pointers it returns point to various indices within the input string. You may find strdup() handy if you want to preserve the input string, but be careful about memory leaks. Writing to stdout or file: Make sure you use the write() system call for all printing (including prompts, error messages, and job status), whether to stdout or to stderr. Why should you use write() instead of fprintf() or printf()? The main difference between the two is that write() performs its output immediately whereas fprintf() buffers the output temporarily in memory before flushing it. As a result, if you use fprintf() you will probably see output from your shell intermingled in unexpected ways with output from the jobs you fork(); you will fail our tests if your output is intermingled. If you decide to use fprintf() make sure you ALWAYS call fflush() immediately after the call to fprintf(). Executing commands: Look into fork(), execvp(), and waitpid().The fork system call creates a new process. After this point, two processes will be executing within your code. You will be able to differentiate the child from the parent by looking at the return value of fork; the child sees a 0, the parent sees the pid of the child. You will note that there are a variety of commands in the exec family; for this project, you must use execvp. Remember that if execvp is successful, it will not return; if it does return, there was an error (e.g., the command does not exist). The most challenging part is getting the arguments correctly specified. The first argument specifies the program that should be executed, with the full path specified; this is straight-forward. The second argument, char *argv[] matches those that the program sees in its function prototype: int main(int argc, char *argv[]); Note that this argument is an array of strings, or an array of pointers to characters. For example, if you invoke a program with: /bin/foo 205 535 then argv[0] = “/bin/foo”, argv[1] = “205” and argv[2] = “535” (where is string is NULL terminated). Note the list of arguments must also be terminated with a NULL pointer; that is, argv[3] = NULL. We strongly recommend that you carefully check that you are constructing this array correctly! The waitpid system call allows the parent process to wait for one of its children. Setting/unsetting environment variables: You could use the setenv() and unsetenv() routines to manage environment variables. Refer to their man pages for more information. Similarly, for retreating the value of an environment variable, you can use the getenv() routine. Testing and Handing in your code Tests will be provided at ~cs537-1/tests/p3. You can read more about the tests in the README in that directory. Handing it in: Copy your file to ~cs537-1/handin/login/p3 where login is your CS login. Do NOT use this handin directory for your work space. You should keep a separate copy of your project files in your own home directory and then simply copy the relevant files to this handin directory when you are done. The permissions to this handin directory will be turned off promptly when the deadline passes and you will no longer be able to modify files in that directory.
5/5 - (1 vote) Overview: In this assignment, you will add new system calls to a toy operating system called xv6. You will implement 2 system calls: fsetoff(int fd, int pos): This syscall changes the position indicator (offset) of the file as per the user input. fgetoff(int fd): This syscall returns the current value of the position indicator (offset) of the specified file. Additionally, you will learn (and demonstrate) using gdb debugger with xv6. Learning objectives: Gain experience using more substantial code bases written by others in which you do not need to understand every line Familiarize with the xv6 code base in particular Learn how to add a system call to xv6 Add a user-level application that can be used within xv6 Become familiar with a few of the data structures in xv6 (e.g., process table and file) Use the gdb debugger on xv6 Administrivia: This project is to be performed alone. Due Date: June 17th, at 11:59pm (4 slip days throughout course, use wisely) This project is to be done on the lab machinesLinks to an external site., so you can learn more about programming in C on a typical UNIX-based platform (Linux). Details: System call: Here are the details about the systems calls you will be adding: int fsetoff(int fd, int pos) fsetoff() syscall will take 2 arguments: open file descriptor and the position value to which the file offset should be changed. If fd and pos are valid, fsetoff will update the offset position parameter of the file corresponding to the input file descriptor. fsetoff() will return 0 on success, -1 if the file is not open, -2 if the pos parameter is invalid; position parameter can be invalid if it is not a number or less than 0 or greater than the size of the file. int fgetoff(int fd) fgetoff() syscall will take 1 argument: open file descriptor. fgetoff() will return -1 if the file is not open. If file is open, fgetoff() will return the current position indicator (offset) value corresponding to the file descriptor. Debugging with GDB Your second task is to demonstrate that you can use gdb to debug xv6 code. You should show the integer value that fdalloc returns the first time it is called after a process has been completely initialized. To do this, you can follow these steps: In one window, start up qemu-nox-gdb. In another window on the same machine, start up gdb (it will attach to the qemu process) and continue it until xv6 finishes its bootup process and gives you a prompt. Now, interrupt gdb and set a breakpoint in the fdalloc() routine. Continue gdb. Then, run the stressfs user application at the xv6 prompt. Your gdb process should now stop in fdalloc. Now, step (or, probably next) through the C code until gdb reaches the point just before fdalloc() returns and print the value that will be returned (i.e., the value of fd). Now immediately quit gdb and run whoami to display your login name. To sanity check your results, you should think about the value you expect fdalloc() to return in these circumstances to make sure you are looking at the right information. What is the first fd number returned after stdin, stdout, and stderr have been set up? If gdb gives you the error message that fd has been optimized out and cannot be displayed, make sure that your Makefile uses the flag “-Og” instead of “-O2”. Debugging is also a lot easier with a single CPU, so if you didn’t do this already: in your Makefile find where the number of CPUS is set and change this to be 1. Take a screenshot showing your gdb session with the returned value of fd printed and your login name displayed. Submit this screenshot to Canvas. Hints This project is very similar to a prior OSTEP projectLinks to an external site.. You can refer to some of the hints and documentation included there. Getting started with xv6 The source code for xv6 (and associated README) can be found in ~cs537-1/public/xv6.tgz . Everything you need to build and run and even debug the kernel is in there. The way to make your own version is to do the following: prompt> cd $PROJ prompt> cp ~cs537-1/public/xv6.tgz . prompt> tar xvzf ~cs537-1/public/xv6.tgz This will create an xv6-public directory in the $PROJ directory. You can then cd into that and begin working. If, for development and testing, you would like to run xv6 in an environment other than the CSL instructional Linux cluster, you may need to set up additional software. You can read these instructions for the MacOS build environment.Links to an external site. Note that we will run all of our tests and do our grading on the instructional Linux cluster so you should always ensure that the final code you handin works on those machines. After you have obtained the source files, you can run make qemu-nox to compile all the code and run it using the QEMU emulator. Test out the unmodified code by running a few of the existing user-level applications, like ls and forktest. With ls inside the emulator, you’ll be able to see a few other applications that are available (as well as files that have been created within the xv6 environment). To quit the emulator, type Ctl-a x. You will want to be familiar with the Makefile and comfortable modifying it. In particular, see the list of existing UPROGS. See the different ways of running the environment through make (e.g., qemu-nox or qemu-nox-gdb). Find where the number of CPUS is set and change this to be 1. For additional information about xv6, we strongly encourage you to look through the code while reading this book by the xv6 authors. Or, if you prefer to watch videos, this great tutorialLinks to an external site. on xv6 by Prof. Remzi describe some of the relevant files. Note that the code and project in the videos do not exactly match what you are doing. We always recommend that you look at the actual code yourself while either reading or watching (perhaps pausing the video as needed). Implementing the syscalls The primary files you will want to examine in detail include syscall.c, sysproc.c, sysfile.c, and file.c. You might also want to take a look at usys.S, which (unfortunately) is written in assembly. To add a system call, find some other very simple system call that also set an value to a pointer-based parameter, like sys_fstat, copy it in all the ways you think are needed, and modify it so it doesn’t do anything and has the new name. Compile the code to see if you found everything you need to copy and change. You probably won’t find everything the first time you try. A file descriptor is a simple number that a computer’s operating system uses to keep track of open files. When a program opens a file, the operating system assigns it a unique file descriptor. This number is then used by the program to read from, write to, or manipulate the file in other ways. Think of it like a ticket you get at a coat check – it doesn’t contain all the information about your coat, but as long as you have that ticket, you can use it to retrieve or modify your coat. Similarly, programs use file descriptors as references to perform operations on the actual files. Operating systems maintain a position offset variable per open file to keep track of the current position within the file for read and write operations. This position offset is initialized to 0 when a file is opened. When user processes issue read or write system calls, the operation commences at the file offset, and the file offset is incremented by the number of bytes read/written. In this project, you will be implementing system calls to update and read this file offset. Consider the modifications needed to make your system calls behave as expected: Take a look at sys_read() to understand how to access file structure using the input file descriptor. You will have to implement similar code to obtain file struct from input fd. Notice how fileread() uses locks (we will learn more about this later in this course) to ensure mutual exclusion while updating the offset. Consider using locks while reading or writing to the position offset. Compare the input pos value to the file size (see filestat() to learn how to obtain file size). Testing your syscalls We suggest you write an example user application (submission not required) that tests your system call. Again, we suggest copying one of the straight-forward utilities that already exist, such as cat.c. Modify the application to open a file and invoke your newly implemented syscalls. To compile the user application, you will have to make changes to the Makefile. Look for UPROGS to add your test user application. Similar to your first project, some tests will be provided at ~cs537-1/tests/p2. Details will be provided once the tests are released. Handing It In Remember there are two handin steps. For your xv6 code, your handin directory is ~cs537-1/handin/LOGIN/p2 where LOGIN is your CS login. Please create a subdirectory called ‘src’: ~cs537-1/handin/LOGIN/p2/src. Copy all of your source files (but not .o files, please, or binaries!) into this directory. A simple way to do this is to copy everything into the destination directory, then type make to make sure it builds, and then type make clean to remove unneeded files. shell% mkdir ~cs537-1/handin/LOGIN/p2/src shell% cd ~cs537-1/handin/LOGIN/p2/src shell% make shell% make clean Second, for your screenshot of your debugging session, upload your image here in Canvas.
5/5 - (1 vote) In this assignment, you will build a set of small utilities that are variants of commonly-used commands in UNIX systems. You will be building 2 utilities: wisc-sed: This utility will be a variation of the sed tool which is used for searching, editing, replacing and deleting patterns in files. wisc-tar: This utility is similar to tar, which is a commonly used UNIX utility to combine/compress a collection of files into one file. This functionality is useful in a number of scenarios e.g. offering a single file to download for software. (If you’ve heard the phrase tarball, that comes from using tar!) Learning objectives: Re-familiarize yourself with the C programming language Re-familiarize yourself with a shell / terminal / command-line of UNIX Learn how to compile C files and execute binaries on the command-line Learn a little about how UNIX utilities are implemented Deliverables: Set of .c files, one for each utility : wisc-sed.c, wisc-tar.c Each file should compile successfully when compiled with the -Wall and -Werror flags. Each utility should pass tests we supply as well as tests that meet our specification that we do not supply Include a single README.md for all the files describing your implementation Administrivia: This project is to be performed alone. Due Date: June 10th, at 11:59pm (4 slip days throughout course, use wisely) A small portion of the credit is allocated for good programming style and memory management. Read the detailsLinks to an external site.. This project is to be done on the lab machinesLinks to an external site., so you can learn more about programming in C on a typical UNIX-based platform (Linux). wisc-sed sed stands for Stream Editor, it is used to perform basic text transformations on an input stream such as a file. You will build wisc-sed, which implements just one functionality: string substitution. Input wisc-sed will take 3 mandatory inputs: , , . It will also take some optional flags. Here’s an example which substitutes the string ‘mascot’ with ‘bucky’ in file ‘a.txt’ while ignoring the case of the strings during comparison. prompt> ./wisc-sed -c -s mascot -r bucky -f a.txt There are 6 flags that your utility needs to handle. Some are mandatory while some are optional. -s : (Mandatory) This is the string to be searched and replaced. -r : (Mandatory) This is the replacement string. -f : (Mandatory) This is the input filename. -o : (Optional) This flag is used to specify an output file. If this isn’t specified than all the output should go to stdout. -n : (Optional) If this flag is passed, then the strings should be replaced only on line . If this flag is not passed, then the string replacement should be applied globally, i.e. all lines of the file. For this assignment, line numbers start from 1 (not 0). -c: (Optional) This flag means the string comparison is not case-sensitive. If this flag is not passed (default), then the string comparison is case-sensitive. Output The output of your utility is the modified file contents with the search string replaced by replacement string. If the -o flag is passed, then the output should be written to a new file with the specified name, else it should be printed to stdout. Details Here are a few details and assumptions you should keep in mind during implementation: All error messages are printed to stdout. If the 3 mandatory inputs are not passed to your utility, then you should print “usage: wisc-sed [optional flags] -s -r -f ” (followed by a newline) and exit with status 1. If the input file is not found you should print “wisc-sed: cannot open file” (followed by a newline) and exit with status 1. You can assume that the files provided as an input exist in the directory where the program is run from (no need to handle ways to store pesky path names). If an output file of the same name already exists, you can overwrite it with the new contents specified. If the line number passed to -n flag exceeds the number of lines in the file, then ignore the flag and don’t replace anything. If an unknown flag is passed, then you can ignore the flag. For simplicity, assume that the search string and replacement string don’t contain spaces, i.e., they are one word each. wisc-tar The second utility you will build is wisc-tar whose functionality is similar to the actual tar utility but much simpler. tar is a UNIX tool, similar to zip, used to combine a collection of files into one file. tar stands for Tape Archive. This tool is useful in a number of scenarios, e.g. offering a single file to download for software. Input wisc-tar will take 2 or more inputs: . For example: prompt> echo abcd > a.txt # creates file a.txt prompt> echo efgh > b.txt # creates file b.txt prompt> ./wisc-tar test.tar a.txt b.txt # combines a.txt and b.txt into test.tar Output The output will be a tar file. In the actual tar implementation, a tar file consists of a series of file objects. Each file object consists of a file header (file name, size, checksum …) and the file contents. For the purpose of this assignment, we will use a simpler file format for our wisc-tar file objects. We will use a 128 byte header which contains only the file name and file size, followed by the file contents padded to multiples of 512 bytes, i.e., if a file only has 1000 bytes of data, 24 bytes of NULL (‘’) are padded at the end. Here’s an example format of a wisc-tar file: file1 name [120 bytes in ASCII] file1 size [8 bytes as binary] contents of file1 [in ASCII, padded to multiple of 512 bytes] file2 name [120 bytes] file2 size [8 bytes] contents of file2 [in ASCII, padded to multiple of 512 bytes] ... Details Here are a few details and assumptions you should keep in mind during implementation: All error messages are printed to stdout. You can assume that the files provided as an input exist in the directory where the program is run from (no need to handle ways to store pesky path names). You can also assume the files provided as inputs only contain ASCII characters. If fewer than two arguments are supplied to your program then you should print “wisc-tar: tar-file file1 […]” (followed by a newline) and exit with status 1. If any of the input files that should be a part of the tar file are not found you should print “wisc-tar: cannot open ” (followed by a newline) and with exit status 1. If a tar-file of the same name already exists you can overwrite it with the new contents specified. If any of the input file names are longer than 120 characters, use the first 120 characters as the filename to store in the output tar file. If any of the input file names are shorter than 120 characters, you should pad the filename such that it uses 120 bytes. For example if the file name is “a.txt”, that only has 5 characters, you should append 115 NULL () characters to make the name use 120 bytes in the tar file. (Remember '' is not the same as '0'. The first one represents NULL while the second one represents the character 0!) Example Lets look at a complete example to make sure we understand the format and how to understand the contents of a valid tar file. In the following example we first create a text file which contains the string “hey”. So its size is 3 (Remember this!). Next we run wisc-tar to create test.tar as shown below. Finally we print the contents of test.tar using hexdump, a utility to print the contents of a binary file. The comments to the right explain the output of hexdump. Remember that the bytes are represented in hexadecimal format, so handy table like thisLinks to an external site. will help you lookup the ASCII values for strings. Try to see if you can decode the contents based on the comments on the right! The below example was captured on a machine with little endian byte order. prompt> echo -n "hey" > a.txt # '-n' does not append newline or linefeed at end of string prompt> ./wis-tar a.tar a.txt prompt> hexdump -v a.tar 0000000 2e61 7874 0074 0000 0000 0000 0000 0000 --> The first five bytes here contain the file name a.txt 0000010 0000 0000 0000 0000 0000 0000 0000 0000 0000020 0000 0000 0000 0000 0000 0000 0000 0000 0000030 0000 0000 0000 0000 0000 0000 0000 0000 0000040 0000 0000 0000 0000 0000 0000 0000 0000 0000050 0000 0000 0000 0000 0000 0000 0000 0000 0000060 0000 0000 0000 0000 0000 0000 0000 0000 ---> We have padded filename using till we hit 120 bytes 0000070 0000 0000 0000 0000 0003 0000 0000 0000 ---> The byte containing 03 indicates the file size is 3. Note that this is not in ASCII! 0000080 6568 0079 0000 0000 0000 0000 0000 0000 ---> The first three bytes contain the string "hey", the contents of the file, followed by 509 bytes of . 0000090 0000 0000 0000 0000 0000 0000 0000 0000 ................. Hints Read this lab tutorialLinks to an external site.; it has some useful tips for programming in the C environment. You’ll need to learn how to use a few library routines from the C standard library (often called libc) to implement the source code for this assignment. All C code is automatically linked with the C library, which is full of useful functions you can call to implement your program. Learn more about the C library hereLinks to an external site. and perhaps hereLinks to an external site.. On UNIX systems, the best way to read about such functions is to use what are called the man pages (short for manual). In our HTML/web-driven world, the man pages feel a bit antiquated, but they are useful and informative and generally quite easy to use. To access the man page for fopen(), for example, just type the following at your UNIX shell prompt: prompt> man fopen Suggested routines Once a file is open, there are many different ways to read from it. There are two functions that might be useful for you in the context of this assignment. Opening/closing files: For this assignment, we recommend using the following routines to open and close files: fopen() and fclose(). There are other system calls open() and close() you could also use but reading lines might be more difficult. Reading/Writing a line: There are two useful functions to read lines: fgets() which can read a fixed number of characters but stops when it reaches the end of a line, and getline() which can read entire lines into a buffer. You can learn more about these functions in, you guessed it, the man pages! To write a string to a file you can similarly use fputs() or the more powerful formatting capabilities in fprintf(). We recommend using getline(). You will need to be able to handle lines that are of an arbitrary length; this means, that you cannot simply call fgets() once for each line of the file, since you will not know the size of the buffer needed to fit the line. You can either look into using a different library routine such as getline() or call fgets() multiple times for each line. String search/comparison: There are many functions available for finding and comparing strings. We recommend using strstr() and strcasestr() to find the position of the matching string. Finding file size: To find the size of a file, you can use either stat()or fstat(). Again, refer to the man pages for more information. Good practices Understand the Code Structure: Start by understanding the problem description and requirements. Read through the man pages and use external resources to understand any unfamiliar concepts or functions. Familiarize yourself with the functions/APIs: Test the new functions with toy examples just to understand how they work. Start Small: Don’t try to implement everything at once. For example, begin by opening a file, reading lines from it and closing. Implement step-by-step: Grow your program by adding one feature at a time Implement Error Handling: Once the core of your code has been implemented, add error handling code as specified in requirements above. Test Your Code: After implementing a feature, test it thoroughly to make sure it works as expected. This will help you catch any bugs or errors early on. Ask for Help When Stuck: Don’t hesitate to ask for help if you’re having trouble understanding a concept or figuring out why your code isn’t working. You can ask the instructors, classmates, or use online resources. Remember, the goal of the project isn’t just to produce a working program, but also to understand the concepts and techniques you’re using. Testing and Handing in your code Some tests are provided at ~cs537-1/tests/p1. Read more about the tests, including how to run them, by executing the command cat ~cs537-1/tests/p1/README.md on any lab machine. Note these test cases are not complete, and you are encouraged to create more on your own. Handing it in: Copy your files to ~cs537-1/handin/login/p1 where login is your CS login. Do NOT use this handin directory for your work space. You should keep a separate copy of your project files in your own home directory and then simply copy the relevant files to this handin directory when you are done. The permissions to this handin directory will be turned off promptly when the deadline passes and you will no longer be able to modify files in that directory. If you cannot find your handin directory, send email to the instructor asking for a handin directory; tell us your CS login.
Question 1 Consider a regression problem involving multiple target variables in which it is assumed that the distribution of the targets, conditioned on the input vector x, is a Gaussian of the form where is the output of a neural network with input vector and wight vector , and is the covariance of the assumed Gaussian noise on the targets. (a) Given a set of independent observations of and , write down the error function that must be minimized in order to find the maximum likelihood solution for , if we assume that is fixed and known. (b) Now assume that is also to be determined from the data, and write down an expression for the maximum likelihood solution for . (Note: The optimizations of and are now coupled.) Question 2 The error function for binary classification problems was derived for a network having a logisticsigmoid output activation function, so that , and data having target values . Derive the corresponding error function if we consider a network having an output and target values for class and for class . What would be the appropriate choice of output unit activation function? Hint. The error function is given by: Question 3 Verify the following results for the conditional mean and variance of the mixture density network model. (a) (b) Question 4 Can you represent the following boolean function with a single logistic threshold unit (i.e., a single unit from a neural network)? If yes, show the weights. If not, explain why not in 1-2 sentences. A B f(A,B) 1 1 0 0 0 0 1 0 1 0 1 0 Question 5 Below is a diagram of a small convolutional neural network that converts a 13×13 image into 4 output values. The network has the following layers/operations from input to output: convolution with 3 filters, max pooling, ReLU, and finally a fully-connected layer. For this network we will not be using any bias/offset parameters (b). Please answer the following questions about this network. (a) How many weights in the convolutional layer do we need to learn? (b) How many ReLU operations are performed on the forward pass? (c) How many weights do we need to learn for the entire network? (d) True or false: A fully-connected neural network with the same size layers as the above network can represent any classifier? (e) What is the disadvantage of a fully-connected neural network compared to a convolutional neural network with the same size layers? Question 6 The neural networks shown in class used logistic units: that is, for a given unit , if is the vector of activations of units that send their output to , and is the weight vector corresponding to these outputs, then the activation of will be . However, activation functions could be anything. In this exercise we will explore some others. Consider the following neural network, consisting of two input units, a single hidden layer containing two units, and one output unit: (a) Say that the network is using linear units: that is, defining and and as above, the output of a unit is for some fixed constant . Let the weight values be fixed. Re-design the neural network to compute the same function without using any hidden units. Express the new weights in terms of the old weights and the constant . (b) Is it always possible to express a neural network made up of only linear units without a hidden layer? Give a one-sentence justification. (c) Another common activation function is a theshold, where the activation is where is 1 if and 0 otherwise. Let the hidden units use sigmoid activation functions and let the output unit use a threshold activation function. Find weights which cause this network to compute the XOR of and for binary-valued and . Keep in mind that there is no bias term for these units.
Question 1 Show that maximization of the class separation criterion given by m2 − m1 = wT(m2 − m1) with respect to w, using a Lagrange multiplier to enforce the constraint wTw = 1, leads to the result that w ∝ (m2 − m1). Question 2 Show that the Fisher criterion J(w) = (m2 − m1) 2 s 2 1 + s 2 2 can be written in the form J(w) = wTSBw wTSWw . Hint. y = wT x, mk = wTmk, s 2 k = ∑ n∈Ck (yn − mk) 2 Question 3 Consider a generative classification model for K classes defined by prior class probabilities p(Ck) = πk and general class-conditonal dendities p(ϕ|Ck) where ϕ is the input feature vector. Suppose we are given a training data set {ϕn, tn} where n = 1, …, N, and tn is a binary target vector of length K that uses the 1-of-K coding scheme, so that it has components tnj = Ijk if pattern n is from class Ck . Assuming that the data points are drwn independently from this model, show that the maximum-likelihood solution for the prior probabilities is given by πk = Nk N , where Nk is the number of data points assigned to class Ck . 1 Machine Learning (CS405) – Homework #4 2 Question 4 Verify the relation dσ da = σ(1 − σ) for the derivative of the logistic sigmoid function defined by σ(a) = 1 1 + exp(−a) Question 5 By making use of the result dσ da = σ(1 − σ) for the derivative of the logistic sigmoid, show that the derivative of the error function for the logistic regression model is given by ∇E(w) = N ∑ n=1 (yn − tn)ϕn. Hint. The error function for thr logistic regression model is given by E(w) = −lnp(t|w) = − N ∑ n=1 {tnlnyn + (1 − tn)ln(1 − yn)}. Question 6 There are several possible ways in which to generalize the concept of linear discriminant functions from two classes to c classes. One possibility would be to use (c − 1) linear discriminant functions, such that yk(x) > 0 for inputs x in class Ck and yk(x) < 0 for inputs not in class Ck . By drawing a simple example in two dimensions for c = 3, show that this approach can lead to regions of x-space for which the classification is ambiguous. Another approach would be to use one discriminant function yjk(x) for each possible pair of classes Cj and Ck , such that yjk(x) > 0 for patterns in class Cj and yjk(x) < 0 for patterns in class Ck . For c classes, we would need c(c − 1)/2 discriminant functions. Again, by drawing a specific example in two dimensions for c = 3, show that this approach can also lead to ambiguous regions. Machine Learning (CS405) – Homework #4 3 Question 7 Given a set of data points {x n} we can define the convex hull to be the set of points x given by x = ∑n αnx n where αn >= 0 and ∑n αn = 1. Consider a second set of points {z m} and its corresponding convex hull. The two sets of points will be linearly separable if there exists a vector wˆ and a scalar w0 such that wˆ Tx n + w0 > 0 for all x n , and wˆ Tz m + w0 < 0 for all z m. Show that, if their convex hulls intersect, the two sets of points cannot be linearly separable, and conversely that, if they are linearly separable, their convex hulls do not intersect.
Question 1 Consider a data set in which each data point is associated with a weighting factor , so that the sum-of-squares error function becomes Find an expression for the solution that minimizes this error function. Give two alternative interpretations of the weighted sum-of-squares error function in terms of (i) data dependent noise variance and (ii) replicated data points. Question 2 We saw in Section 2.3.6 that the conjugate prior for a Gaussian distribution with unknown mean and unknown precision (inverse variance) is a normal-gamma distribution. This property also holds for the case of the conditional Gaussian distribution of the linear regression model. If we consider the likelihood function, then the conjugate prior for and is given by Show that the corresponding posterior distribution takes the same functional form, so that and find expressions for the posterior parameters , , , and . Question 3 Show that the integration over in the Bayesian linear regression model gives the result Hence show that the log marginal likelihood is given by Question 4 Consider real-valued variables and . The variable is generated, conditional on , from the following process: where every is an independent variable, called a noise term, which is drawn from a Gaussian distribution with mean 0, and standard deviation . This is a one-feature linear regression model, where is the only weight parameter. The conditional probability of has distribution , so it can be written as Assume we have a training dataset of pairs ( ) for , and is known. Derive the maximum likelihood estimate of the parameter in terms of the training example ‘s and ‘s. We recommend you start with the simplest form of the problem: Question 5 If a data point follows the Poisson distribution with rate parameter , then the probability of a single observation is You are given data points independently drawn from a Poisson distribution with parameter . Write down the log-likelihood of the data as a function of . Question 6 Suppose you are given observations, , independent and identically distributed with a ) distribution. The following information might be useful for the problem. If , then and The probability density function of is , where the function is only dependent on and not . Suppose, we are given a known, fixed value for . Compute the maximum likelihood estimator for .
Question 1 (a) [True or False] If two sets of variables are jointly Gaussian, then the conditional distribution of one set conditioned on the other is again Gaussian. Similarly, the marginal distribution of either set is also Gaussian (b) Consider a partitioning of the components of into three groups , , and , with a corresponding partitioning of the mean vector and of the covariance matrix in the form Find an expression for the conditional distribution in which has been marginalized out. Question 2 Consider a joint distribution over the variable whose mean and covariance are given by (a) Show that the marginal distribution is given by . (b) Show that the conditional distribution is given by . Question 3 Show that the covariance matrix that maximizes the log likelihood function is given by the sample covariance Is the final result symmetric and positive definite (provided the sample covariance is nonsingular)? Hints (a) To find the maximum likelihood solution for the covariance matrix of a multivariate Gaussian, we need to maximize the log likelihood function with respect to . The log likelihood function is given by (b) The derivative of the inverse of a matrix can be expressed as We have the following properties Question 4 (a) Derive an expression for the sequential estimation of the variance of a univariate Gaussian distribution, by starting with the maximum likelihood expression Verify that substituting the expression for a Gaussian distribution into the Robbins-Monro sequential estimation formula gives a result of the same form, and hence obtain an expression for the corresponding coefficients . (b) Derive an expression for the sequential estimation of the covariance of a multivariate Gaussian distribution, by starting with the maximum likelihood expression Verify that substituting the expression for a Gaussian distribution into the Robbins-Monro sequential estimation formula gives a result of the same form, and hence obtain an expression for the corresponding coefficients . Hints (a) Consider the result for the maximum likelihood estimator of the mean , which we will denote by when it is based on observations. If we dissect out the contribution from the final data point , we obtain (b) Robbins-Monro for maximum likelihood K Norm Accuracy(%) 3 L1 3 L2 3 L-inf 5 L1 5 L2 5 L-inf 7 L1 7 L2 7 L-inf Question 5 Consider a -dimensional Gaussian random variable with distribution in which the covariance is known and for which we wish to infer the mean from a set of observations . Given a prior distribution , find the corresponding posterior distribution . Program Question Use online judge to solve this problem In this coding exercise, you will implement the K-nearest Neighbors (KNN) algorithm. You are provided with a Jupyter Notebook just for reference. The requirement on online judge will be very different from the notebook. This is a classification problem and we will use the Breast Cancer dataset: Table1: Accuracy for the KNN classification problem on the validation set A training data (X train) is provided which has several datapoints, and each datapoint is a pdimensional vector (i.e., p features). Your task is to implement the K-nearest neighbors algorithm. Use the Euclidean distance.
Problem 1. For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache. 1. What is the cache block size (in words)? 2. How many entries does the cache have? 3. What is the ratio between total bits required for such a cache implementation over the data storage bits? (Assume the cache has valid bit, dirty bit and reference bit) Starting from power on, the following byte-addressed cache references are recorded. 4. How many blocks are replaced? 5. What is the hit ratio? 6. List the final state of the cache, with each valid entry represented as a record of . Problem 2. Caches are important to providing a high-performance memory hierarchy to processors. Below is a list of 32-bit memory address references, given as word addresses 3, 180, 43, 2, 191, 88, 190, 14, 181 The is exercise examines the impact of different cache designs, specifically comparing associative caches to the direct-mapped caches. The address stream is shown above. 1. For a three-way set associative cache with two-word blocks and a total size of 24 words, which bits represent index and which bits represent tag in 32-bit memory address (e.g. Index: 9-5 Tag: 31-10) ? 2. Using the given sequence, show the final cache contents for a three-way set associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the block of set bits, and if it is a hit or a miss. 3. For a fully associative cache with one-word blocks and a total size of 8 words, which bits represent index and which bits represent tag in 32-bit memory address (e.g. Index: 9-5 Tag: 31-10) ? 4. Using the given sequence, show the final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss. Multilevel caching is an important technique to overcome the limited amount of space that a first level cache can provide while still maintaining its speed. Consider a processor with the following parameters: 5. Calculate the CPI for the processor in the table using: 1) only a first level cache, 2) a second level direct-mapped cache, and 3) a second level eight-way set associative cache. Problem 3. This Exercise examines the single error correcting, double error detecting (SEC/DED) Hamming code. 1. What is the minimum number of parity bits required to protect a 128-bit word using the SEC/DED code? 2. Consider a SEC code that protects 8 bit words with 4 parity bits. If we read the value 0x375, is there an error? If so, correct the error. Problem 4. Virtual memory uses a page table to track the mapping of virtual addresses to physical addresses. This exercise shows how this table must be updated as addresses are accessed. The following data constitutes a stream of virtual addresses as seen on a system. Assume page size is 4 KiB, a 4-entry fully associative TLB, and true LRU replacement. If pages must be brought in from disk, increment the next largest page number (i.e. If the current maximum physical page number is 12, then the next physical page brought in from disk has a page number of 13). 4669, 2227, 13916, 34587, 12608 TLB: Page table: 1. Which bits represent virtual page number and which bits represent page offset in 32- bit virtual memory address? 2. Given the address stream shown, and the initial TLB and page table states provided above, show the final state of the system. Also list for each reference if it is a hit in the TLB, a hit in the page table, or a page fault. 3. Show the final contents of the TLB if it is 2-way set associative. The page table states are provided above and the initial TLB is shown below. Discuss the importance of having a TLB to high performance. How would virtual memory accesses be handled if there were no TLB? TLB:
Problem 1. In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies: IF ID EX MEM WB 250ps 350ps 150ps 300ps 200ps Also, assume that instructions executed by the processor are broken down as follows: alu beq lw sw 45% 20% 20% 15% 1. What is the clock cycle time in a pipelined and non-pipelined processor? 2. What is the total latency of an LW instruction in a pipelined and nonpipelined processor? 3. If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? 4. Assuming there are no stalls or hazards, what is the utilization of the data memory? The utilization of a resource is defined as the duration the resource is used over the total time. 5. Assuming there are no stalls or hazards, what is the utilization of the writeregister port of the “Registers” unit? 6. Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs (e.g., ST only takes 4 cycles because it does not need the WB stage). Compare clock cycle times and execution times with single-cycle, multi-cycle, and pipelined organization. Problem 2. In this exercise, we examine how data dependences affect execution in the basic 5- stage pipeline described in Section 4.5. Problems in this exercise refer to the following sequence of instructions: or r1,r2,r3 or r2,r1,r4 or r1,r1,r2 Also, assume the following cycle times for each of the options related to forwarding: Without Forwarding With Full Forwarding With ALU-ALU Forwarding Only 250ps 300ps 290ps 1. Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them. 2. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them. 3. What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speedup achieved by adding full forwarding to a pipeline that had no forwarding? 4. Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage). 5. What is the total execution time of this instruction sequence with only ALUALU forwarding? What is the speedup over a no-forwarding pipeline? Problem 3. In this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code: sw r16,12(r6) lw r16,8(r6) beq r5,r4,Label # Assume r5!=r4 add r5,r1,r4 slt r5,r15,r4 Assume that individual pipeline stages have the following latencies: IF ID EX MEM WB 200ps 120ps 150ps 190ps 100ps 1. For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we only have one memory (for both instructions and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee forward progress, this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only has one memory? We have seen that data hazards can be eliminated by adding nops to the code. Can you do the same with this structural hazard? Why? 2. For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register (without an of set) as the address, these instructions no longer need to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages. Change this code to accommodate this changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence? 3. Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if branch outcomes are determined in the ID stage, relative to the execution where branch outcomes are determined in the EX stage? 4. Given these pipeline stage latencies, repeat the speedup calculation from question 2, but take into account the (possible) change in clock cycle time. When EX and MEM are done in a single stage, most of their work can be done in parallel. As a result, the resulting EX/MEM stage has a latency that is the larger of the original two, plus 20 ps needed for the work that could not be done in parallel. 5. Given these pipeline stage latencies, repeat the speedup calculation from question 3, taking into account the (possible) change in clock cycle time. Assume that the latency ID stage increases by 50% and the latency of the EX stage decreases by 10ps when branch outcome resolution is moved from EX to ID. 6. Assuming stall-on-branch and no delay slots, what is the new clock cycle time and execution time of this instruction sequence if beq address computation is moved to the MEM stage? What is the speedup from this change? Assume that the latency of the EX stage is reduced by 20 ps and the latency of the MEM stage is unchanged when branch outcome resolution is moved from EX to MEM. Problem 4. In this exercise we compare the performance of 1-issue and 2-issue processors, taking into account program transformations that can be made to optimize for 2- issue execution. Problems in this exercise refer to the following loop (written in C): for(i=0;i!=j;i+=2) b[i]=a[i]–a[i+1]; When writing MIPS code, assume that variables are kept in registers as follows, and that all registers except those indicated as Free are used to keep various variables, so they cannot be used for anything else. i j a b c Free R5 R6 R1 R2 R3 R10, R11, R12 1. Translate this C code into MIPS instructions. Your translation should be direct, without rearranging instructions to achieve better performance. 2. If the loop exits after executing only two iterations, draw a pipeline diagram for your MIPS code from question 1 executed on a 2-issue processor shown in Figure 4.69. Assume the processor has perfect branch prediction and can fetch any two instructions (not just consecutive instructions) in the same cycle. 3. Rearrange your code from question 1 to achieve better performance on a 2- issue statically scheduled processor from Figure 4.69. 4. Repeat question 2, but this time use your MIPS code from question 3. 5. What is the speedup of going from a 1-issue processor to a 2-issue processor from Figure 4.69?
Problem 1.Consider two different implementations P1 and P2 of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2 for four classes respectively. Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster? 1. What is the global CPI for each implementation? 2. Find the clock cycles required in both cases. Problem 2. The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s. 1. Find the CPI if the clock cycle time is 0.333 ns. 2. Find the SPECratio. 3. Find the new CPU time if the number of instructions of the benchmark is increased by 10% without affecting the CPI. 4. Find the new CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%. 5. Find the new SPECratio for this change. 6. Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. T e execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI. Problem 3. 1. Assume 23 and 112 are signed 8-bit decimal integers stored in two’s complement format. Calculate 23 + 112 using saturating arithmetic. The result should be written in decimal. Show the steps for calculation. 2. Assume 23 and 112 are signed 8-bit decimal integers stored in two’s complement format. Calculate 23 – 112 using saturating arithmetic. The result should be written in decimal. Show the steps for calculation. 3. Assume 23 and 112 are unsigned 8-bit integers. Calculate 23 + 112 using saturating arithmetic. The result should be written in decimal. Show the steps for calculation. Problem 4. Calculate the product of the hexadecimal unsigned 8-bit integers 62 and 13 using the hardware described below. You should show the contents of each register on each step. Use a table to show the detailed process. Problem 5. Calculate 52 divided by 21 using the hardware described below. You should show the contents of each register on each step. Assume both inputs are unsigned 6-bit integers. Use a table to show the detailed process.
Problem 1. For the following C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, and i are given and could be considered 32-bit integers as declared in a C program. Use a minimal number of MIPS assembly instructions. f = g + ( h – 5 ); Problem 2. For the following C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. B[8] = A[ i – j ]; Problem 3. The table below shows 32-bit values of an array stored in memory. Address Data 24 2 28 4 32 3 36 6 40 1 1) For the memory locations in the table above, write C code to sort the data from lowest to highest, placing the lowest value in the smallest memory location shown in the figure. Assume that the data shown represents the C variable called Array, which is an array of type int, and that the first number in the array shown is the first element in the array. Assume that this particular machine is a byte-addressable machine and a word consists of four bytes. 2) For the memory locations in the table above, write MIPS code to sort the data from lowest to highest, placing the lowest value in the smallest memory location. Use a minimum number of MIPS instructions. Assume the base address of Array is stored in register $s6. Problem 4. Provide the type, assembly language instruction, and binary representation of instruction described by the following MIPS fields (all the numbers are in deximal): op=0, rs=3, rt=2, rd=3, shamt=0, funct=34
Introduction The aim of this assignment is to make sure that you understand and are familiar with the concepts covered in the lectures, including web application architectures, state management using session, cookies, and cache. This is an individual assignment. Each student must complete and submit independent work. No cooperation is allowed, even among the team members for assignment 3. I hope this will be an opportunity to reflect on some key concepts discussed in the class and demonstrate the application of them using a simple web application. Requirements: In homework #3 part I, you have proposed an application that uses web services. In this homework you will be developing a prototype of it. That means, you are not required to implement all the functionality proposed in homework #3 part I. Following are the minimum requirements you need to meet a) Your ASP .NET application should use HTML + Script run at server architecture (please see one example posted). b) Minimum five (5) web services developed in homework #3 need to be used in your application. Of which, at least 2 web services need to be restful web services c) Use date caching to store at least one data item in the cache. You can use a suitable data object based on your application. d) Use session or cookie in keeping state information of at least one data item. Submission Requirement All submissions must be electronically submitted to the assignment folder where you downloaded the assignment paper. All files must be zipped into a single zip file. Late submission: No late submissions will be accepted for this homework
Introduction The aim of this assignment is to make sure that you understand and are familiar with the concepts covered in the lectures, including XML elements, attributes, statements, XML schema, XML validation and their classes in .Net FCL. By the end of the assignment, you should have applied these concepts and techniques in creating an XML file, its schema, its style sheet, and have written Web services and an SOA application to process these files. This is an individual assignment. Each student must complete and submit independent work. No cooperation is allowed, even among the team members for assignment 3. You can use your ASU personal Web site space to host your XML file (see Part 0 question 4), and use localhost to host the services in Part 2. Part 0 Practice Exercises (No submission required) No submission is required for this part of exercises. However, doing these exercises can help you better understand the concepts and thus help you in quizzes or exams. 1. Reading: Textbook Chapter 4. 2. Answer the multiple choice questions 1.1 through 1.16 of the text Section 4.8. Study the material covered in these questions can help you to prepare for the class exercises, quizzes, and the exam. 3. Study for the questions 2 through 8 in text Section 4.8. Make sure that you understand these questions and can briefly answer these questions. Study the material covered in these questions can help you to prepare for the exam and understand the homework assignment. 4. If you have not activated your file service and personal Web hosting site at ASU, you can activate them at: https://selfsub.asu.edu/apps/WebObjects/ASURITEActivation. Then, you can search for Uploading Your Personal Web page within ASU search page to find the steps of uploading your file. Notice that ASU Personal Web site space hosts files only. You can use it host your .xml, .xsd, and .xsl files. It does not host programs such as Web services and Web applications. IIS are not installed.Part 1 Creating XML, and XSD Files (40 points) This part has 40 points. The grade will be based on your Pat 1 submission. You will use three files created in Part 1 in your Part 2 questions, and you must include them in Part 2. You may modify these files when you resubmit them in Part 2. However, we will not grade these files in Part 2. We will grade the given questions only in Part 2. In this assignment, you will create a directory of hotels, which can be represented as an XML tree. The diagram below shows required structure of the directory of hotels in XML tree notation that you will create in this assignment. All the “Hotel” elements have the same structure. Notice that different shapes of boxes have different meanings. They represent elements, attributes, and text contents/values, respectively. The structure of element and attributes given in the diagram below are required to implement, while the given text contents/values in the diagram are examples.1. Create an XML file Hotels.xml as an instance of your schema file. Enter the information of at least five (5) real hotels into the Hotels.xml file. You can use any tool to edit the file. If an element has a Required Attribute, you must provide the attribute for each of the elements. If an element has an Implied (Optional) Attribute, you will provide this attribute for some elements, but not for all elements. [15 points] 2. Write the Hotels.xsd file that defines the XML schema allowing the structure shown in the diagram above. You can use any tool such as visual studio to create/edit the file. [10 points] 3. Create an ASP.NET Web site application (not a Web service) that takes the URL of an XML file as input (textbox) and display the element (tag) names, text contents, and attribute names, and attribute values in the GUI page. You can use any traversal algorithms discussed in the class to navigate the XML document. Make your Hotel.xml as the default input. [Study the Example given with this homework before you attempt this question] [15 points]Submission list for questions 1, 2, 3: The complete solution folder with all project files question 3 and the application that can be tested by the TAs. You also need to submit the files used in the question 2, such as Hotels.xml, Hotels.xsd. Zip all these file into a zip file for submission. You can name the project as hw2partIPart 2 Creating Web Services to Process XML, XSD, Files (40 points) 4. Develop a Web service (.svc) with two of the Web operations listed below. The node mentioned in the sub questions below includes every component (element, content, and attribute) showing the XML tree in part 1. 4.1 Web operation “verification” takes the URL of an XML (.xml) file and the URL of the corresponding XMLS (.xsd) file as input and validates the XML file against the corresponding XMLS (XSD) file. The Web method returns “No Error” or an error message showing the available information at the error point. You must use files that you create in the previous questions, with and without fault injection, as the test case. However, your service operation should work for other test cases too. [15 points] 4.2 Web operation “XPathSearch” takes the URL of an XML (.xml) file and a path expression as input. It returns the path expression value of the given path. It could be a list of nodes, the content value, etc., depending on the path give. [15 points] 5. Create a Web site application (ASP.NET), and add the project into the same “solution” that hosts your web service. The Web site application must provide a GUI (TryIt Page), which allows entering the required inputs, such as URLs and keyword, path, or contents, based on the questions that you select. The GUI must have the buttons required to invoke the service the service operations implemented above • The button that invoke the validation of the XML file against the schema file; • The button searches by keyword or by path in the XML file.The Web application must use the Web service created in Question 5 to perform the required processing, and display the return message in the GUI. You can use a textbox, a list box, or a label to display your output. This assignment will be implemented on localhost or IIS Express. [20 points]Submission list for question 4 and 5: The complete solution folder with all project files for the services and the application that can be tested by the TAs. You can create a single solution and have the web service and the ASP .NET application implemented. You can name the solution as hw2partIISubmission Requirement All submissions must be electronically submitted to the assignment folder where you downloaded the assignment paper. All files must be zipped into a single zip file. Grading The TA will grade your program following these steps: (1) The TA will read your program and give points based on the points allocated to each component, the readability of your code (organization of the code and comments), logic, inclusion of the required functions, and correctness of the implementations of each function. (2) Compile the code. If it does not compile, 40% of the points given in (1) will be deducted. For example, if you are given 20 points in step (1), your points will become 12 if the program fails to compile. (3) If the code passes the compilation, the TA will execute and test the code. If, for any reason, the program gives an incorrect output or crashes for any input, 20% of the points given in (1) will be deducted. Please notice that the TA will not debug your program to figure out how big or how small the error is. You may lose 40% or 20% of your points for a small error such missing a comma or a space! Late submission deduction policy: For each part of the submission • No penalty for late submissions that are received within 24 hours of the given deadline; • 1% grade deduction for every hour after the first 24 hours of the grace period!