1. (35 pts)Consider the following cost matrix to solve a warehouse location problem to minimize the total setup and transportation costs. Warehouse sites Cust. Loc. A B C 1 100 1000 200 2 1000 100 200 3 500 500 500 Fixed Cost 300 300 X What is the largest integer value for X (fixed cost of cite C) for which the greedy algorithm we have seen in the class gives a solution that is not optimal, regardless of how one break the ties? 2. (35 pts) Assume that we have the following closeness ratings for the four departments we want to place in a 2×2 grid layout plan. Dep 1 2 3 4 1 – A E O 2 A – E O 3 E E – O 4 O O O – Prove or disprove the following claim: • The layout given below is an optimal solution that maximizes the closeness score for any grading scale in which an A relationship has a strictly more value than an E relationship and an E relationship has a strictly larger value than an O relation. 1 4 2 3 3. (30 pts) Write a small code snippet in your favorite coding environment to implement Johnson’s Algorithm to solve the two-machine flow-shop problem given in the provided file schedule.xlsx. Upload your code and screenshots for the answer.
1. Consider a single-machine production system that produces aluminum sheets for car manufacturing. The monthly production target is 100 sheets. The minimum and maximum thickness of the aluminum sheets should be between 2,97 and 3.03 millimeters to meet the car manufacturing standards. The historical data shows that the thickness of the sheets produced by the current pressing machine has a normal distribution with a mean of 3.00 mm and a standard deviation of 0.02mm. (a) (25 pts) Use the normal approximation to the binomial yield model to estimate the probability that the production requirements are met if the number of items that are manufactured is 100, 110, and 120. (b) (25 pts) Repeat the same exercise if the required quantity is 1000 and 1000, 1100, and 1200 items are produced. 2. (50 pts) Consider a production cell with a single worker and several semiautonomous machines. Due to the issues with the raw material used, with a two percent probability, a product may get stuck in a machine after the processing, and a repair team is called to remove the product and recalibrate the machine if that happens. The maintenance takes 45 minutes to complete (including the traveling time of the team) and during the repair, the is a 40% chance that the product gets damaged. The maintenance teams do not end their shift before fixing a broken machine (they work overtime if needed). We know the following parameters for the production system. • There is a single shift in the factory that lasts 8 hours. The workers take a 20-minute break at the middle of the shift. • Machining time for a product is a random variable that is uniformly distributed between 5 and 7 minutes. • The loading time is normally distributed with a mean of 45 seconds and a standard deviation of 10 seconds. Similarly, the unloading time is also normally distributed with a mean of 30 seconds and a standard deviation of 5 seconds. • The quality check takes 1 minute for the worker (if the part is not stuck) and if a part is not stuck the probability that it passes the quality check is 98.5%. • It takes 40 seconds for a worker to move from one machine to the other one. Use a simulation approach to answer the following questions. (a) What would be the expected production quantities if 2,3,4 or 5 machines are placed in the production cell? (b) Assume 3 machines are placed in the cell. How much the daily production quantity would increase if the stuck probability is reduced by 1 percent? (c) Assume 3 machines are placed in the cell. How much the daily production quantity would increase if the successful production probability for the unstuck parts is increased by 1 percent? Comment on the effectiveness of the system improvements in parts (b) and (c) by comparing the production increases.
1. Semiconductor wafer fabrication is very much error-prone. Assume that a wafer production facility received an order for a specially designed prototype wafer. The cost of producing each wafer is estimated to be $20,000. The customer agrees to pay $150,000 for 3 good wafers, $200,000 for 4 good wafers, and $250,000 for 5 good wafers. Other than 3, 4, or 5 good wafers, all other wafers must be destroyed (i.e. have no value). To obtain the contract, the wafer fab offers to pay the customer a penalty of $100,000 if at least three good wafers are not produced. Assume that each wafer is produced independently and the probability that a wafer is acceptable is 0.65. Build a function in your favorite coding environment (Python, R, Java, etc.), to answer the following questions. It is important to note that your calculations should be general enough to hold when different inputs (revenues, probabilities, etc.) are changed. For this purpose, always define all inputs as variables whose values can be changed. Writing your function keep in mind that the number of successfully produced items would have a Binomial distribution. Submit your code along with the answers for the following questions. a. (10 pts) Construct an expected profit table for Q=1,2,…,10 (report the expected profit obtained when Q wafers are manufactured). What is the optimal quantity to manufacture to maximize the expected profit? b. (10 pts) Plot how does the optimal quantity change when the probability of a wafer being acceptable changes in the range of 0.5 to 0.95? c. (10 pts) Plot how does the optimal order quantity change if the production cost per wafer changes in the range from $10,000 to $40,000? d. (10 pts) The wafer company is risk-averse and would also like to avoid the risk of losing money. Using the binary loss/gain probabilities. Report the probability of losing money for Q=1,2,…,10. e. (20 pts) Now assume that consecutive wafer productions are not necessarily independent. Consider the following model: P(wafer i+1 is good | wafer i is good)=0.9 and P(wafer i+1 is bad| wafer i is good)=0.1. In addition, P(wafer i+1 is good | wafer i is bad)=0.6, P(wafer i+1 is bad | wafer i is bad)=0.4. Assume that the first wafer is good with probability 0.75 and find the expected profit when 4 wafers are scheduled. 2. A firm has received an order for 25 die cast parts made from precious metals. The parts will sell for $5000 each and it will cost $2500 to produce an individual part. The probability of a part meeting final inspection equals 0.75. Parts that meet the inspection but are not sold can be recycled at a value of $900. Parts that do not meet the inspection have a unit salvage value of $100. A penalty clause in the contract results in the firm having to pay the customer $500 per unit short if the number of parts that can be delivered is less than 25. In addition, if the number that can be delivered is less than 23, there is an additional fixed penalty of $10,000 (i.e. if 20 parts can be delivered, the total penalty is (25-20)⨯500 + 10,000=$12,500). a. (10 pts) Determine the expected profit for a given production quantity. b. (10 pts) Find the production quantity that maximizes the expected profit. c. (20 pts) Assume that there is a process improvement that can improve production quality. If this improvement takes place, the probability of a given part meeting the final inspection will raise to 95%. What is the maximum amount that should be paid for such an improvement?
1. (15 Points) Assume that you are planning to produce and sell pasteurized watermelon juice. (a) Develop a gap map for the product. Choose appropriate attributes (dietary/health value, taste, image etc.) and some benchmark products (coke, orange juice, other fruit juices, dietary beverages, chocolate milk, ice tea etc.) (b) Assume that your primary market is Koc University campus. State how you can estimate the potential weekly sales (consider the number of potential consumers, profile of students etc.) 2. (15 Points) Consider the watermelon juice operation (production, packaging, sales). Sketch some ideas on the choice of packaging (glass, plastic, or others) from the following perspectives. • Design for manufacturing • Design for logistics • Design for life cycle management 3. (70 Points) Consider the MILP model we discussed in class for the make-or-buy decisions and answer the following questions. (a) (25 Points) Assume that in addition to the one-time fixed setup cost (c3) there is a also fixed operating cost, c4 (per time period), for the in-house production facility for maintenance and personal costs. Assume that once we open the production facility for in-house production at some time period t, then it needs to stay open until the end of the planning horizon. Note that in that case, we need to pay the cost c4 for all the consecutive time periods starting with t. Present a new mathematical model that would include this new cost structure. (b) (10 Points) Now assume that in every period t ∈ T the maximum amount you can buy from the third-party supplier is limited by Yt and the procurement cost depends on the time the product is purchased (given as c t 2 , t ∈ T). The production costs are also time dependent ( given as c t 1 , t ∈ T) explain how you would modify the model in part a to include this limitation. (c) (35 Points) Consider the extensions suggested in parts a and b and use the data provided in the HW2.xlsx file and find the optimal production plan that minimizes the total cost. Upload your code and the final screenshot that shows the result.
1. A multinational company has production plants in several countries and considers to open two new plants to reduce transportation costs. The set of open plants is denoted by F¯ and the set of candidate locations for the new facilities is donated by F. The production capacity of a facility f ∈ F ∪ F¯ is given by qf . Let K denote the set of customers. The demand amount of a customer k ∈ K is given as dk and the company has to serve all customers’ demands. Per unit transportation cost per unit distance is given as c and the distance between a facility f ∈ F ∪ F¯ and a customer k ∈ K is denoted by tfk. Due to the customs regulations not every facility can serve every customer. The set of customers a facility f ∈ F ∪F¯ can serve is given K(f). Formulate a mixed integer linear program (MILP) to find the locations of the two new facilities that would minimize the total transportation costs. 2. A new cargo company wants to build a delivery network in Turkey, using a hub-and-spoke structure. The data about the expected demand, transportation cost, transportation time, and setup costs for the candidate hub locations are presented in the attached data file (TurkishData.xlsx). (a) Assume that the company wants to open only a single hub. Use the single-hub location algorithm we discussed in class to find the optimal location that minimizes the total setup and transportation costs. Upload your code and the final screenshot that shows the result. (b) Now assume that the company has a total budget of 2400 to spend for the hub facilities. Write down a MILP model to find the optimal hub locations that minimize the total transportation cost and solve your model on your favorite mathematical modeling environment. Upload your code and the final screenshot that shows the result.
1 Projection free convex low-rank matrix optimization Consider the matrix optimization problem: X ? ∈ arg min X f(X) : rank(X) ≤ r, X ∈ R p×m , for some convex objective function f , where r < min{p, m} forces the solution to be low-rank. Such an optimization problem appears in many real-world applications—you will see two examples in the second and third parts of this homework. However, the matrix rank function is discrete-valued and non-convex, so solving this optimization problem is hard (NP-hard in general). Convex relaxation methods are extensively studied for solving low-rank matrix optimization problems. As pointed out in [5], the nuclear norm is an effective convex surrogate to obtain (approximately) low-rank solutions: X ? ∈ arg min X f(X) : kXk∗ ≤ κ, X ∈ R p×m . When projection onto the nuclear norm ball is easy to perform, we can solve this problem by projected gradient methods. However, as the problem dimensions increase, the projection becomes a computational bottleneck. PROBLEM 1.1: PROJECTION ONTO THE NUCLEAR NORM BALL (15 Points) (a) (10 Points) Let Z = UΣV > be the singular value decomposition of Z ∈ R p×m . Denote the diagonal of Σ ∈ R s×s by a vector σ ∈ R s , where s = min{p, m}. Let σ `1 be the projection of σ onto the `1-norm ball {x : x ∈ R s , kxk1 ≤ κ}. Show that the projection of this matrix onto the nuclear norm ball X = {X : X ∈ R p×m , kXk∗ ≤ κ} can be computed by projecting σ onto the `1 norm ball, i.e., projX (Z) = UΣ `1V > where Σ `1 ∈ R s×s denotes the diagonal matrix with diagonal σ `1 . (Hint: Use Mirsky’s inequality: kX − ZkF ≥ kΣX − ΣZkF, where ΣX, ΣZ ∈ R s×s are the diagonal matrices of the singular values of X, Z respectively.) (b) (5 Points) Implement the projection operator as a function called projNuc. You can use the provided file projL1. Set κ = 5000 and measure the computation time of the projection operator with the 100k and 1M MovieLens datasets. You can do this by running the script Pr11b, which loads the datasets, constructs the data matrix, and times the evaluation of the projection operator. Write the values you get in your report. Run your script at least for 5 times and report the average timings. These datasets consist of the ratings given by MovieLens users to movies in a given list. Of course, a user may not rate all of the movies. Modeling the ratings as entries of a low-rank matrix, where one dimension corresponds to different users, and the other corresponds to different movies, the goal of low-rank matrix completion is to predict the users’ ratings on unrated movies. This is essential to building a movie recommendation system. The 100k dataset consists of 100,000 ratings from 1000 users on 1700 movies. The 1M dataset consists of 1 million ratings from 6000 users on 4000 movies. Note that both of these datasets are relatively small; for instance, the famous Netflix competition data consists of 100480507 ratings that 480189 users gave to 17770 movies. Projecting a matrix of this size would require a huge amount of time. LIONS @ EPFL Prof. Volkan Cevher Problem 1.1 shows that projection onto the nuclear norm ball requires computing the singular value decomposition. The computational complexity of the singular value decomposition is O(min(m 2 p, mp2 )), and this can easily become a computational bottleneck if m and p are large. This increased the popularity of algorithms that leverage the linear minimization oracle (lmo) instead (e.g., [2, 7]): lmoX(Z) = arg min X∈X hX, Zi where hX, Zi = Tr(Z > X). Note that lmoX(Z) is not single valued in general. With abuse of terminology, when we say that we compute the lmo, we actually mean that we compute an instance X such that X ∈ lmoX(Z). PROBLEM 1.2: LMO OF NUCLEAR NORM (15 Points) (a) (10 Points) Show that the lmoX when X is the nuclear norm ball: X = {X : X ∈ R p×m , kXk∗ ≤ κ} gives the following output: −κuvT ∈ lmoX(Z), where u and v are the left and right singular vectors that correspond to the largest singular value of Z. (Hint: By definition κuvT ∈ X. You just need to show hX, Zi ≥ h−κuvT , Zi for all X ∈ X.) (b) (5 Points) Implement the lmo with X as a function called lmoNuc. Set κ = 5000 and measure the computation time for the 100k and 1M Movielens datasets. You can do so by running the script Pr12b, which loads the datasets, constructs the data matrix, and times the evaluation of the lmo. Write the values you get in your report. Run your script at least for 5 times and report the average timings. Compare these values with the computation time of the projection operator from Problem 1.1.b. 2 Hands-on experiment 1: Crime Scene Investigation with Blind Deconvolution You are working with the police. You are asked to identify the license plate of a car involved in a crime scene investigation. Unfortunately, the CCTV image of the car is blurry. In this part, we simulate this scene by trying to deblur a plate image found from the internet1 . This is an instance of the blind deconvolution problem. The set-up of the problem is as follows: Given two unknown vectors x,w ∈ R L , we observe their circular convolution y = w ∗ x, i.e., y` = XL ` 0=1 w` 0 x`−` 0+1 where the index ` − ` 0 + 1 in the sum above is understood to be modulo L. The goal of blind deconvolution is to separate w and x. The keyword blind comes from the fact that we do not know much priori information about the signals. We simply assume w and x belong to known subspaces of R L of dimension K and N, i.e., we can write w = Bh, h ∈ R K x = Cm, m ∈ R N for some L × K matrix B and L × N matrix C. The columns of these matrices provide bases for the subspaces in which w and x live. In the image deblurring example, x corresponds to the image we want to recover, w to a 2D blur kernel. We assume that we know (or have an estimate of) the support (i.e., the location of nonzeros) of the blur kernel. In real applications, support can be estimated by an expert using the physical information such as the distance of object to the focus and the camera, the speed of the camera and/or the object, camera shutter speed, etc. In this experiment, we use a very rough estimate for the support and simply use a box at the center of the domain. We roughly tuned the size of the box. Interestingly, it is possible to make the plate readable even with this very rough estimate. The 2D convolution y = w∗x produces a blurred image. As we have seen in the last homework, natural images have sparse wavelet expansions. Hence, the image x can be expressed as x = Cm with C is the matrix formed by a subset of the columns of the wavelet transform matrix. In addition, the blur kernel w can be written as w = Bh with B is the matrix formed by a subset of the columns of the identity matrix. 1https://plus.maths.org/issue37/features/budd/blurredplate.jpg 2 LIONS @ EPFL Prof. Volkan Cevher Blurred image of licence plate y Estimate of the support of blur kernel w PROBLEM 2.1: FRANK-WOLFE FOR BLIND DECONVOLUTION (25 POINTS) Let b be the L-point normalized discrete Fourier transform (DFT) of the observation y, i.e, b = Fy, where F is the DFT matrix. Then, b can be written as b = A(X) where X = hm> and A is a linear operator. Explicit expression of this linear operator A is out of the scope of this homework, c.f., [1] for further details. This reformulation allows us to express y, which is a nonlinear combination of the coefficients of h and m, as a linear combination of the entries of their outer product X = hm> . Note that given B and C, recovering m and h from b is the same as recovering x and w from y. Since X is a rank one matrix, we can use the nuclear norm to enforce approximately low-rank solutions. Then, we can formulate the blind deconvolution problem as follows: X ? ∈ arg min X 1 2 kA(X) − bk 2 2 : kXk∗ ≤ κ, X ∈ R p×m . (1) Here κ > 0 is a tuning parameter. We will apply the Frank-Wolfe algorithm to solve the optimization problem given in (1). The Frank-Wolfe algorithm is one of the earliest algorithms that avoids the proximal operator. Instead, it leverages the lmo [2]: lmo(∇f(Z)) = arg min X∈X h∇f(Z), Xi, where X = {X : kXk∗ ≤ κ, X ∈ R p×m } as in Part 1. It applies to the generic constrained minimization template with a smooth objective function, minX{f(X) : X ∈ X} as follows: Frank-Wolfe’s algorithm 1. Choose X 0 ∈ X. 2. For k = 0, 1, . . . perform: Xˆ k := lmo(∇f(X k )), X k+1 := (1 − γk)X k + γkXˆ k , where γk := 2/(k + 2). (a) (5 Points) Recall that the Frank-Wolfe algorithm applies only for smooth objectives. Show that the objective function is smooth, in the sense its gradient is Lipschitz continuous. (b) (20 Points) Complete the missing lines in the FrankWolfe function to implement the algorithm. We provide you the linear operators that you need to compute the lmo in the code. Note that we do not need to store and use the linear operator A in the ambient dimensions. In fact, for storage and arithmetic efficiency, we should avoid forming A. You can find more details about this aspect as comments in the code. Test your implementation using the script Test_deblur. Tune the parameter κ until the license plate number becomes readable. What is the license plate number? You can also tune the support of the blur kernel and try to get better approximations. 3 Hands-on experiment 2: k-means Clustering by Semidefinite Programming Clustering is an unsupervised machine learning problem in which we try to partition a given data set into k subsets based on distance between data points or similarity among them. The goal is to find k centers and to assign each data point to one of the centers such that the sum of the square distances between them are minimal [4]. This problem is known to be NP-hard. Problem Definition: Given a set of n points in a d−dimensional Euclidean space, denoted by S = {si = (si1, · · · , sid) > ∈ R d i = 1, · · · , n} 3 LIONS @ EPFL Prof. Volkan Cevher find an assignment of the n points into k disjoint clusters S = (S 1, · · · , S k) whose centers are cj(j = 1, · · · , k) based on the total sum of squared Euclidean distances from each point si to its assigned cluster centroid ci , i.e., f(S,S) = Xk j=1 |S j X | i=1 ks j i − cjk 2 , where |S j | is the number of points in S j , and s j i is the i th point in S j . [4] proposes an SDP-relaxation to approximately solve the abovementioned model-free k−means clustering problem . The resulting optimization problem2 takes the standard semidefinite programming form X ? ∈ arg min X hC, Xi : X1 = 1 | {z } A1(X)=b1 , X > 1 = 1 | {z } A2(X)=b2 , X ≥ 0 |{z} B(X)∈K , Tr(X) ≤ κ, X ∈ R p×p , X 0 | {z } X , (2) where C ∈ R p×p is the Euclidean distance matrix between the data points. Tr(X) ≤ κ enforces approximately low-rank solutions, the linear inclusion constraint X ≥ 0 is element-wise nonnegativity of X, the linear equality constraints X1 = 1 and X > 1 = 1 require row and column sums of X to be equal to 1’s, and X 0 means that X is positive semi-definite. Recall that Tr(X) = kXk∗ for any positive semi-definite matrix X. Algorithm: Program in (2) can be reformulated as min x∈X f(x) + g1(A1(x)) + g2(A2(x)), subject to B(x) ∈ K (3) where f(x) = hC, xi is a smooth convex function, g1 is the indicator function of singleton {b1}, g2 is the indicator function of singleton {b2} and K is the positive orthant for which computing the projection is easy. Note that the classical Frank-Wolfe method does not apply to this problem due to nonsmooth terms g1, g2. In the sequel, we will attempt to solve this problem with the HomotopyCGM method proposed in [7] to handle the non-smooth problems with a conditional gradient based method. The algorithm to solve the problem in (3) is given below. Conditional Gradient Method(CGM) for composite problems 1. Choose x 0 ∈ X and β0 > 0 2. For k = 1, 2, . . . perform: γk = 2/(k + 1), and βk = β0/ √ k + 1 vk = βk∇f(xk) + A > 1 (A1 xk − b1) + A > 2 (A2 xk − b2) + (xk − projK (xk)) xˆ k := argminx∈X hvk , xi x k+1 := (1 − γk)x k + γk xˆ k 3.Output: x k Dataset: We use a similar setup described by [3]. We use the fashion-MNIST data in [6] which is released as a possible replacement for the MNIST handwritten digits . Each data point is a 28×28 grayscale image, associated with a label from 10 classes. Figure 3 depicts 3 row samples from each class. Classes are labeled from 0 to 9. First, we extract the meaningful features from this dataset using a simple 2 layers neural network with a sigmoid activation. Then, we apply neural network to 1000 test samples from the same dataset, which gives us a vector µ ∈ R 10 where each entry represents the probability being in that class. Then, we form the pairwise distance matrix C by using this probability vectors.3 PROBLEM 3.1: CONDITIONAL GRADIENT METHOD FOR CLUSTERING FASHION-MNIST DATA (45 POINTS) (a) (10 Points) Show that the domain X = {X : Tr(X) ≤ κ, X ∈ C p×p , X 0} is a convex set. 2See section (2) of [4] for details of this relaxation. 3 In the code, you do not need to worry about any of the processing details mentioned here. You are directly given the matrix C. 4 LIONS @ EPFL Prof. Volkan Cevher T-shirt/top Trouser Pullover Dress Coat Sandal Shirt Sneaker Bag Ankle boot Figure 1: Fashion MNIST data (b) (10 Points) Given a linear inclusion constraint T x ∈ Y, the corresponding quadratic penalty function is given by QPY(x) = dist2 (T x, Y) = min y∈Y ky − T xk 2 . Write down the constraints in (3) in the quadratic penalty form and show that the penalized objective takes the form f(x) + 1 2β kA1(x) − b1k 2 + 1 2β kA2(x) − b2k 2 + 1 2β dist2 (x, K), and show that the gradient of the penalized objective is equal to vk/β in the algorithm. (Hint: You can write dist2 (T x, Y) = ky ∗ − T xk 2 , where y ∗ = arg miny∈Y ky − T xk 2 . and take the derivative with respect to X without worrying about y ∗ depending on X, thanks to Danskin’s theorem 4 .) (c) (5 Points) Write down vk explicitly by deriving the gradient and projection specific to Problem (3). (d) (20 points) Complete the missing lines in the function definition of HomotopyCGM, and run the Test_clustering to solve the k-means clustering problem. Using the function value_kmeans, compute and report the k-means value before and after 4https://en.wikipedia.org/wiki/Danskin’s_theorem 5 LIONS @ EPFL Prof. Volkan Cevher running the algorithm. Plot the results and SDP solution X after 5000 iterations. What is the final objective value? Is it below the optimal value provided to you? If yes, explain the reason. (Hint: Note that when X is as given in (2), κuu> ∈ lmoX(X), where u is the eigenvector corresponding to the smallest eigenvalue of X.) 4 Guidelines for the preparation and the submission of the homework Work on your own. Do not copy or distribute your codes to other students in the class. Do not reuse any other code related to this homework. Here are few warnings and suggestions for you to prepare and submit your homework. • This homework is due at 09:00AM, 6 January, 2020 • Submit your work before the due date. Late submissions are not allowed and you will get 0 point from this homework if you submit it after the deadline. • Questions of 0 point are for self study. You do not need to answer them in the report. • Your final report should include detailed answers and it needs to be submitted in PDF format. • The PDF file can be a scan or a photo. Make sure that it is eligible. • The results of your simulations should be presented in the final report with clear explanation and comparison evaluation. • We provide some Python scripts that you can use to implement the algorithms, but you can implement them from scratch using any other convenient programming tool (in this case, you should also write the codes to time your algorithm and to evaluate their efficiency by plotting necessary graphs). • Even if you use the Python scripts that we provide, you are responsible for the entire code you submit. Apart from completing the missing parts in the scripts, you might need to change some written parts and parameters as well, depending on your implementations. • The codes should be well-documented and should work properly. Make sure that your code runs without error. If the code you submit does not run, you will not be able to get any credits from the related exercises. • Compress your codes and your final report into a single ZIP file, name it as ee556_2019_hw4_NameSurname.zip, and submit it through the moodle page of the course. References [1] AHMED, A., RECHT, B., AND ROMBERG, J. Blind deconvolution using convex programming. IEEE Transactions on Information Theory 60, 3 (2014), 1711–1732. [2] JAGGI, M. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In Proc. 30th Int. Conf. Machine Learning (2013). [3] MIXON, D. G., VILLAR, S., AND WARD, R. Clustering subgaussian mixtures by semidefinite programming. arXiv preprint arXiv:1602.06612 (2016). [4] PENG, J., AND WEI, Y. Approximating k-means-type clustering via semidefinite programming. SIAM journal on optimization 18, 1 (2007), 186–205. [5] RECHT, B., FAZEL, M., AND PARRILO, P. A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 3 (2010), 471–501. [6] XIAO, H., RASUL, K., AND VOLLGRAF, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017. [7] YURTSEVER, A., FERCOQ, O., LOCATELLO, F., AND CEVHER, V. A conditional gradient framework for composite convex minimization with applications to semidefinite programming. arXiv preprint arXiv:1804.08544 (2018).
1 Minimax problems and GANs Consider the function f : R 2 → R where f(x, y) = xy 1. (10 points) Find the first order stationary points, and classify them as local minimum, local maximum or saddle point according to the Hessian. 2. (10 points) Show that (x ? , y ? ) = (0, 0) is a solution to the minimax problem minx maxy f(x, y). This means that f(x ? , y ? ) ≥ f(x ? , y) and f(x ? , y ? ) ≤ f(x, y ? ), for all x, y. 3. (20 points) One possible attempt at finding this solution via iterative first-order methods is to perform gradient updates on the variables x and y. More precisely for γ > 0 consider the gradient descent/ascent updates xk+1 = xk − γ∇x f(xk , yk), yk+1 = yk + γ∇y f(xk , yk) Show that the sequence of iterates {xk , yk} ∞ k=0 starting from any point (x0, y0) , (0, 0) diverges, for any γ > 0. Find the rate at which its distance to the origin grows. In the context of GANs, suppose the true distribution is a multivariate normal on R 2 (with mean and covariance matrix as specified in the code), and the noise distribution is a standard normal on R 2 . Your generator and dual variable classes are defined as G := {g : g(z) = Wz + b}, F := {f : f(x) = v T x} (1) For a matrix W ∈ R 2×2 and vectors b, v ∈ R 2 . 1. (10 points) Suppose the space R 2 is equipped with the `2-norm k(x, y)k 2 2 = x 2 + y 2 . For any f ∈ F compute its Lipschitz constant with respect to this norm. Describe the set of functions in F whose Lipschitz constant is at most 1. 2. (10 points) Implement a function enforcing the 1-Lipschitz constraint of the dual variable and the generator and dual variable functions (in variables.py). Then implement an stochastic estimate of the objective function of the minimax game (in trainer.py): min g∈G max f ∈F E[f(X) − f(g(Z)))] (2) where X has the true distribution, and Z has the noise distribution. In order to implement the stochastic estimate you will use samples from such distributions. Use the methods available for the class torch.distributions.Distribution from the PyTorch module. Remark: The enforce_lipschitz method should not return any value. It should modify the parameters of f inplace. In order to do this we recommend directly modifying any tensor’s values by rewriting their data attribute e.g., x.data = x.data + 1. This avoids PyTorch’s automatic tracking of operations for automatic differentiation, which might cause issues. LIONS @ EPFL Prof. Volkan Cevher 3. (25 points) complete the missing functions in trainer.py, You should implement alternating and simultaneous stochastic gradient ascent/descent updates. More specifically fk+1 = fk + γSGf(fk , gk), gk+1 = gk − γSGg(fk , gk) (simultaneous) (3) fk+1 = fk + γSGf(fk , gk), gk+1 = gk − γSGg(fk+1, gk) (alternating) (4) Where SG is the stochastic gradient oracle. in the file optim.py we provide a modification of PyTorch’s SGD optimizer, which allows for negative learning rates (stepsize). This is only a trick to be able to perform stochastic gradient descent for the generator, and stochastic gradient ascent for the dual variable. PyTorch’s Optimizer.step() method always performs a step in the direction of the negative gradient; by allowing both positive and negative learning rates we can effectively switch between gradient descent or ascent. Run both methods using the script train.py passing the option –training_mode simultaneous or –training_mode alternating. Include the generated plots in your report. Comment on your findings. 4. (10 points) Show that given two distributions µ and ν, if F is the class of functions defined in (1), it holds that: max f ∈F EX∼µ[f(X)] − EY∼ν[f[Y]] ≥ 0 (5) 5. (15 points) Show that given two distributions µ and ν on R 2 , if their first moments coincide i.e., E(x1,x2)∼µ[x1] = E(x1,x2)∼ν[x1], E(x1,x2)∼µ[x2] = E(x1,x2)∼ν[x2] (6) then, if F is the class of functions defined in (1), it holds that: max f ∈F EX∼µ[f(X)] − EY∼ν[f[Y]] = 0 (7) Why is this a possible explanation to the observed behaviour of our GANs example? 2 Optimizers of Neural Network The goal of this exercise is to implement different optimizers for a handwritten digit classifier using the well-known MNIST dataset. This dataset has 60000 training images, and 10000 test images. Each image is of size 28 × 28 pixels, and shows a digit from 0 to 9. (a) General guideline: complete the codes marked with TODO in the optimizers.py. (b) The implementation of mini-batch SGD and SGD with HB momentum are provided as examples. Vanilla Minibatch SGD Input: learning rate γ 1. initialize θ0 2. For t = 0, 1,… ,N-1: obtain the minibatch gradient gˆt update θt+1 ← θt − γgˆt 2 LIONS @ EPFL Prof. Volkan Cevher Minibatch SGD with Momentum Input: learning rate γ, momentum ρ 1. initialize θ0, m0 ← 0 2. For t = 0, 1,… ,N-1: obtain the minibatch gradient gˆt update mt+1 ← ρmt + gˆt update θt+1 ← θt − γmt+1 (c) Implement of following optimizers: (a) (10 points) Implement AdaGrad method AdaGrad Input: global learning rate γ, damping coefficient δ 1. initialize θ0, r ← 0 2. For t = 0, 1,… ,N-1: obtain the minibatch gradient gˆt update r ← r + gˆt gˆt update θt+1 ← θt − γ δ+ √ r gˆt where is the element-wise multiplication between two matrices. (b) (10 points) Implement RMSProp RMSProp Input: global learning rate γ, damping coefficient δ, decaying parameter τ 1. initialize θ0, r ← 0 2. For t = 0, 1,… ,N-1: obtain the minibatch gradient gˆt update r ← τr + (1 − τ)gˆt gˆt update θt+1 ← θt − γ δ+ √ r gˆt (c) (10 points) Implement Adam method Adam Input: global learning rate γ, damping coefficient δ, first order decaying paramter β1, second order decaying parameter β2 1. initialize θ0, m1 ← 0, m2 ← 0 2. For t = 0, 1,… ,N-1: obtain the minibatch gradient gˆt update m1 ← β1m1 + (1 − β1)gˆt update m2 ← β2m2 + (1 − β2)gˆt gˆt correct bias mˆ1 ← m1 1−β t+1 1 mˆ2 ← m2 1−β t+1 2 update θt+1 ← θt − γ mˆ1 δ+ √ mˆ2 (d) (20 points) Set the learning rate to 0.5, 10−2 , and 10−5 . Run the neural network for 15 epochs, repeat it for 3 times and compare the obtained average accuracies. Plot the obtained training loss for different optimizers. State how the performance of the adaptive learning-rate methods versus SGD and SGD with momentum methods is compared. 3 LIONS @ EPFL Prof. Volkan Cevher 3 Guidelines for the preparation and the submission of the homework Work on your own. Do not copy or distribute your code to other students in the class. Do not reuse any other code related to this homework. Here are few warnings and suggestions for you to prepare and submit your homework. • This homework is due at 4:00PM, 15 November, 2019 • Submit your work before the due date. Late submissions are not allowed and you will get 0 point from this homework if you submit it after the deadline. • Questions of 0 points are for self study. You do not need to answer them in the report. • Your final report should include detailed answers and it needs to be submitted in PDF format. • The PDF file can be a scan or a photo. Make sure that it is eligible. • The results of your simulations should be presented in the final report with clear explanation and comparison evaluation. • We provide Pytorch scripts that you can use to implement the algorithms, but you can implement them from scratch using any other convenient programming tool (in this case, you should also write the codes to time your algorithm and to evaluate their efficiency by plotting necessary graphs). • Even if you use the Pytorch scripts that we provide, you are responsible for the entire code you submit. Apart from completing the missing parts in the scripts, you might need to change some written parts and parameters as well, depending on your implementation. • The code should be well-documented and should work properly. Make sure that your code runs without errors. If the code you submit does not run, you will not be able to get any credits from the related exercises. • Compress your code and your final report into a single ZIP file, name it as ee556_2019_hw2_NameSurname.zip, and submit it through the moodle page of the course.
1 MULTICLASS CLASSIFICATION – 70 POINTS In Lecture 2 (slide 19) you were introduced binary logistic regression [3] and learned how to derive the expression of its maximum likelihood estimator. We briefly remind you of the procedure and then guide you through extending these notions to multiple classes. We are working with data represented by pairs a T i , bi , where a T i is the i th row of matrix A ∈ R n×d and bi is the i th entry of b ∈ R n . Each bi denotes the class label associated to sample a T i and we have a total of C = 2 classes, which we encode as {−1, 1}. We denote by n the number of data samples and by d the problem dimension. Lastly, we aim to model our data with a feature vector x ∈ R d . For clarity, refer to the visual representations below. A = ← a T 1 → ← a T 2 → . . . ← a T n → ∈ R n×d b = b1 b2 . . . bn ∈ {−1, 1} n x = x1 x2 . . . xd ∈ R d (1) The logistic model assumes there is a linear relationship between x and the logarithm of the odds of bi = 1 1 , given ai . Formally, we can write this as log P (bi = 1 | ai , x) P (bi = −1 | ai , x) | {z } Odds of bi=1 to bi=−1. = a T i x C=2 ⇐⇒ log P (bi = 1 | ai , x) 1 − P (bi = 1 | ai , x) | {z } This quantity is called a logit, hence the name logistic regression. = a T i x ⇐⇒ P (bi = 1 | ai , x) = e a T i x 1 + e a T i x | {z } The logistic function. Assuming the data samples are statistically independent, we can write down the probability of observing vector b as: P(b | A, x) = P (b1 ∧ b2 ∧ . . . ∧ bn | A, x) = Yn i=1 P (bi | ai , x) (2) Expression (2) is called the likelihood of observing b, given data A and parameters x. The goal of logistic regression is to find a feature vector xˆML that maximizes the likelihood of observing b. We can find xˆML using the numerical optimization methods you have seen in the lectures. The objective function is chosen as the negative logarithm of the likelihood 2 , as minimizing it is equivalent to maximizing P(b | A, x). The expression of the maximum likelihood estimator is therefore given by: xˆML ∈ arg min x {− log P(b | A, x) = Xn i=1 log h 1 + exp −bia T i x i : x ∈ R d }. We now extend the binary setting to C > 2 classes, which naturally leads to having correspondingly many feature vectors. Note that one may use the same technique as is the binary case, by setting one of the classes as the pivot (for example C) and writing the log-odds against it. This effectively leaves us with C − 1 feature vectors because we fix xC by construction. We will take a different approach that is (arguably) more natural and easier to implement in PART II of this exercise. This approach uses C feature vectors represented as a matrix X ∈ R d×C . For clarity, refer to the visual representation below. X = ↑ ↑ ↑ x1 x2 . . . xC ↓ ↓ ↓ ∈ R d×C xi = x1i x2i . . . xdi ∈ R d (3) 1The choice of class label does not matter – we could have just as well chosen -1. 2Operating with sums is easier, hence the logarithm. The minus is added for making the function convex. LIONS @ EPFL Prof. Volkan Cevher To arrive at our probabilistic model, we start from the following equations: log (P(bi = 1 | ai , X) S ) = a T i x1 =⇒ log P(bi = 1 | ai , X) = a T i x1 − log S =⇒ P(bi = 1 | ai , X) = e a T i x1 S log (P(bi = 2 | ai , X) S ) = a T i x2 =⇒ log P(bi = 2 | ai , X) = a T i x2 − log S =⇒ P(bi = 2 | ai , X) = e a T i x2 S . . . . . . . . . log (P(bi = C | ai , X) S ) = a T i xC =⇒ log P(bi = C | ai , X) = a T i xC − log S =⇒ P(bi = C | ai , X) = e a T i xC S Different from the binary case, the probability in the denominator is replaced by a factor S that acts as a normalizing constant. We set S = PC k=1 e a T i xk in order for the probabilities to sum to 1. We therefore have: P(bi = j | ai , X) = e a T i xj PC k=1 e a T i xk PART I – THEORY (35 POINTS) EXERCISE 1.I.1: (7 POINTS) – Following the same reasoning as for the binary case, show that the maximum likelihood estimator for multinomial logistic regression problem has the following expression: Xˆ ML ∈ arg min X f(X) | f : R d×C → R, f(X) = Xn i=1 logXC k=1 e a T i xk − a T i xbi , bi ∈ {1, 2 . . .C} . (4) EXERCISE 1.I.2: (7 POINTS) – Show that the gradient with respect to X of the loss function f from Exercise I.1 has the following matrix form (this will come in useful in the second part of the exercise where you have to implement the algorithms): ∇f(X) = A TZexp(AX) − Y , where Z is an n × n matrix with Zi j = 1 PC k=1 e a T i xk , if i = j 0, othewrise , Y is an n × C matrix whose rows y T i are the one-hot-encodings 3 corresponding to data bi and exp is applied entry-wise. HINT: It is helpful to first compute the gradient of f with respect to a single entry xi j of X. EXERCISE 1.I.3: (SELF-STUDY, 0 POINTS) – In this exercise, we are interested in showing that f is convex. We provide you with the expression of the Hessian of f below 4 : ∇ 2 f(X) = Xn i=1 Σi ⊗ aia T i , (5) where ⊗ is the Kronecker product, and Σi is given by: Σi = σi1(1 − σi1) −σi1σ2 . . . −σi1σiC −σi2σi1 σi2(1 − σi2) . . . −σi2σiC . . . . . . . . . . . . −σiCσi1 −σiCσi2 . . . σiC(1 − σiC) , with σi j = e a T i xj PC k=1 e a T i xk . (6) Knowing that a function of multiple variables is convex iff its Hessian is PSD, use the expression of ∇ 2 f above to prove that f is convex. HINT: • Start by showing that both Σi < 0 and aia T i < 0, ∀i. You can use the following: D < 0 ⇐⇒ z TDz ≥ 0, ∀z. 3The one-hot-encoding of class k ≤ C is given by vector y ∈ R C , with yk = 1 and yj = 0, j , k. Precisely, if bi = 2, then yi has 1 on position 2 and 0 everywhere else. 4The interested student can refer to the Appendix for details of the derivation. 2 LIONS @ EPFL Prof. Volkan Cevher • Then apply the following result [9]: D, E < 0 =⇒ D ⊗ E < 0. • Finally, argue that given two sum-compatible matrices D, E < 0, you have D + E < 0. EXERCISE 1.I.4: (8 POINTS) – We now seek to estimate the Lipschitz constant L of ∇f , as we will need it for PART II of this exercise. We know that L is such that λmax(∇ 2 f) ≤ L < ∞, where λmax(∇ 2 f) is the largest eigenvalue of ∇ 2 f(X). a) Knowing that aia T i is a rank-1 matrix, show that λmax(aia T i ) = kaik 2 2 . b) Starting from (5), show that we can choose L = kAk 2 F 2 HINT: You can use the following statements: • For D, E < 0, we have λmax(D ⊗ E) = λmax(D)λmax(E). • Given D < 0, we have λmax(d) ≤ maxi P jDi j(the top eigenvalue is upper-bounded by the maximum `1 norm of rows). • 1 − σi j = PC k=1 k,j σik. This comes from having a probability distribution over the C classes. EXERCISE 1.I.5: (8 POINTS) – Sometimes we wish to enforce certain properties onto our solution (e.g. sparsity). This can be achieved by adding a regularizer to our objective: F(X) = f(X) + λg(X), As we saw in Lecture 9, such a function can be minimized using proximal gradient algorithms, provided that g is ‘proximable’ (i.e., its proximal operator is efficient to evaluate). We recall the proximal operator of g as the solution to the following convex problem: proxg (z) := arg min y∈Rd {g(y) + 1 2 ky − zk 2 2 }. Given g : R d → R, g(x) := kxk1, show that its proximal function can be written as proxλg (z) = max(|z| − λ, 0) ◦ sign(z), z ∈ R d (7) where the operators max, sign and |·| are applied coordinate-wise to the vector z and ◦ stands for (x ◦ y)i = xiyi . Such a regularizer imposes sparsity on the solutions. HINT: Show that relation (7) holds for z ∈ R. Then use the fact that kyk1 + 1 2 ky − zk 2 2 = Pd i=1 |yi |+ 1 2 (yi − zi) 2 to conclude that the operators can be applied coordinate-wise. EXERCISE 1.I.6: (5 POINTS) – Given g(x) := 1 2 kxk 2 2 , show that its proximal function can be written as proxλg (z) = z 1 + λ . (8) Such a regularizer bounds the growth of X’s components, thus helping with numerical stability. PART II – HANDWRITTEN DIGIT CLASSIFICATION ( 35 POINTS) In this section you will implement algorithms based on your theoretical derivations from PART I and test them on the well-known MNIST dataset [4] consisting of 28 × 28 grayscale images of handwritten digits. Figure 1 below depicts a subset of samples pertaining to each of the 10 classes. 3 LIONS @ EPFL Prof. Volkan Cevher Figure 1: MNIST selected digit images. Source: https://wikipedia.org/wiki/MNIST_database The problem constants become C = 10, d = 28 × 28 = 784 and we choose n = 5000. The rows a T i of our data matrix A are the vectorized5 representations of the digit images. Finally, the labels bi are integers of the set {0, 1, . . . 9}. The data is preprocessed and given to you in compressed format in the file mnist_train_test_split.npz. The code for loading it is already provided in the main.py file. For this exercise you will implement algorithms which solve the following regularized versions of the multiclass Logistic Regression: • `1 regularized (or sparse) Logistic Regression: F(X) = f(X) + λ XC k=1 kxkk1 . (9) • `2 regularized Logistic Regression: F(X) = f(X) + λ 2 XC k=1 kxkk 2 2 . (10) Using the information in PART I of this exercise, as well as the contents of Lecture 9 you are asked to fill in the code gaps, run the experiments and interpret the results. EXERCISE 1.II.1: (20 POINTS) – In the file named algorithms.py you can find the incomplete methods whose code you need to fill in. a) In the file operators.py fill in the methods l1_prox and l2_prox with the proximal operator expressions from PART I adapted to problems (9) and (10), respectively. Also fill in the gradfx with the expression of ∇f from PART I. b) Using the information in Lecture 9 fill in the codes of methods ista, fista (adapted to also handle restarting) and gradient_scheme_restart_condition from the file algorithms.py. In the latter method, you need to implement the so-called ‘gradient restart condition’, described in depth in papers such as [2, 7]. Formally, the criterion is: hY t − X t+1 , X t+1 − X t i > 0, where t is the iteration index. The term Y t − Y t+1 can be seen as a gradient, while X t+1 − X t is the descent direction of the momentum term. Overall this criterion resets the momentum of FISTA to 0 when it goes in a bad descent direction. Note that we are working with matrices, so we need to adapt the inner product correspondingly: hD, Ei = T r(D T E). NOTE: The Y in the gradient restart condition refers to the FISTA iterate. Do not confuse it with the one-hot-encoding matrix from (1). c) Using Lecture 9 as guideline, fill in the method prox_sg form the file algorithm.py and stocgradfx from operators.py. The latter method needs to return the minibatch stochastic gradient of f . It is not difficult to show that given ms stochastic samples of A i.e., rows of A chosen uniformly at random, then ∇˜ ms f(X) = n ms Xms i=1 ai e a T i X PC k=1 e a T i xk − y T i 5A vectorized representation of a matrix Y ∈ R n×d is given by a vector y ∈ R nd obtained by arranging the columns of Y on top of one another. 4 LIONS @ EPFL Prof. Volkan Cevher is an unbiased stochastic gradient of ∇f(X). The minibatch size and learning rate decrease function are given to you in the params variable of the main method – please make sure to use those. Note: For the prox_sg method you need to record convergence with respect to the ergodic iterate Xˆ t = Pt j=1 γjX j Pt j=1 γj , where γj is the learning rate at step j. EXERCISE 1.II.2: (10 POINTS) – Running the script provided in main.py will produce plots of the convergence for your implementation of the 4 methods above. a) Insert the convergence plots into your report. b) Run the deterministic methods for 2000 iterations and comment on how the convergence rates observed in your plots compare to the theoretical ones (refer to the lecture notes). Are the practical rates better, worse or as expected? c) For the deterministic methods, compare the convergence speed of FISTA relative to ISTA, and of FISTA-RESTART with respect to FISTA. Explain your observations. d) Run Prox-SG for 50000 iterations. Comment on the accuracy and convergence speed of deterministic methods versus Prox-SG. Note that Prox-SG is plotted with respect to the number of epochs (full passes through the data), where an epoch is comparable to a full gradient. Explain your observations. e) In the main.py file, you have calls to the method plot_digit_features. This method takes as input the feature matrices X obtained from running FISTA with restart for 2000 iterations for both problems (9) and (10). Insert the plots produced by calling these methods into the report. Do you observe any similarity between the feature vectors and their corresponding classes? EXERCISE 1.II.3: (5 POINTS) – In the file called train_and_test_mnist_nn.py you are provided with the code for training and evaluating a fully-connected three-layer neural net with ReLU activation functions. Identical to Logistic Regression, the network is trained on a dataset with 5000 MNIST images. We now compare the test-set accuracies corresponding to the solutions obtained by the neural net, `1 and `2 regularized Logistic Regression. These accuracies are computed for a test set of 10000 MNIST images. • Run train_and_test_mnist_nn.py until completion and insert the final test set accuracy printed on the screen. • In the main.py template you are provided with code 6 that reports the test-set accuracy of the solution obtained by FISTA with gradient scheme restart for the `1 and `2 regularized problems. Run the method for 2000 iterations and insert the two accuracies in your report. • Is there a difference between the accuracy of the neural net solution and those obtained by FISTA with gradient scheme restart? Does the Logistic regression model have the same expressive power as the neural net? Why / why not? HINT: Think of linear versus nonlinear models. 2 IMAGE RECONSTRUCTION – 30 POINTS In this exercise an image should be understood as the matrix equivalent of a digital grayscale image, where an entry represents the intensity of the corresponding pixel. 2.1 Wavelets A widely used multi-scale localized representation in signal and image processing is the wavelet transform.7 This representation is constructed so that piecewise polynomial signals have sparse wavelet expansions [5]. Since many real-world signals can be composed by piecewise smooth parts, it follows that they have sparse or compressible wavelet expansions. Scale. In wavelet analysis, we frequently refer to a particular scale of analysis for a signal. In particular, we consider dyadic scaling, such that the supports from one scale to the next are halved along each dimension. For images, which can be represented with matrices, we can imagine the highest scale as consisting of each pixel. At other scales, each region correspond to the union of four neighboring regions at the next higher scale, as illustrated in Figure 2. The Wavelet transform. The wavelet transform offers a multi-scale decomposition of an image into coefficients related to different dyadic regions and orientations. In essence, each wavelet coefficient is given by the scalar product of the image with a wavelet function which is concentrated approximately on some dyadic square and has a given orientation (vertical, horizontal or diagonal). Wavelets 6The section which calls method compute_accuracy. 7This section is an adaption from Signal Dictionaries and Representations by M. Watkin (see http://bit.ly/1wXDDbG). 5 LIONS @ EPFL Prof. Volkan Cevher Figure 2: Dyadic partitioning of the unit square at scales j = 0, 1, 2. The partitioning induces a coarse-to-fine parent/child relationship that can be modeled using a tree structure. are essentially bandpass functions that detect abrupt changes in a signal. The scale of a wavelet, which controls its support both in time and in frequency, also controls its sensitivity to changes in the signal. An important property of the wavelet functions is that they form an orthonormal basis W. Formally, this means WWT = WTW = I, where I is the identity operator. In the discrete case, W is just an orthonormal matrix. Implementation. We provide the codes that efficiently compute the 2D Wavelet transform W and its inverse W−1 = WT . Operators W and WT receive a vectorized image as input and outputs a vectorized transform. We base our implementation on the PyWavelets library8 . You can find a usage example in the file example_wavelet.py. The interaction with the pywt library happens through the class RepresentationOperator available in the utilities file. • The method W(im) computes the transformation from an image im to its wavelet counterpart im_wav. • The method WT(im_wav) computes the transformation from wavelet coefficients im_wav back to the image domain im. Remark 1. Should the provided codes not work after you installed the required packages, directly contact one of the teaching assistants. 2.2 Total Variation The Total Variation (TV) was proposed by Rudin, Osher and Fatemi [8] as a regularizer for solving inverse problems. There is compelling empirical evidence showing that this regularization does not blur the sharp edges of images and as a consequence, significant research efforts went into developing efficient algorithms for computing the TV proximal operator. In this exercise we use the approach developed by Chambolle [1]. Computing the TV of an image relies on the discrete gradient operator ∇x ∈ R (m×m)×2 , formally given by (∇x)i, j = (∇x) 1 i, j , (∇x) 2 i, j with (∇x) 1 i, j = xi+1, j − xi, j if i < m, 0 if i = m, (∇x) 2 i, j = xi, j+1 − xi, j if j < m, 0 if i = m. Then, the discrete Total Variation (TV) is defined as kxkTV := P 1≤i, j≤m k(∇x)i, jk2 called isotropic P 1≤i, j≤m |(∇x) 1 i, j | + |(∇x) 2 i, j | called anisotropic Even though the isotropic case presents a slightly more difficult computational challenge, it has the advantage of being unaffected by the direction of the edges in images. The TV-norm can be seen as the `1-norm of the absolute values of the gradients evaluated at each pixel. Analogous to the case of `1-norm, minimizing the TV-norm subject to data fidelity leads to sparse gradients, which implies an image with many flat regions and few sharp transitions; see Figure ??. 8https://pywavelets.readthedocs.io/en/latest/index.html 6 LIONS @ EPFL Prof. Volkan Cevher 2.3 Image in-painting Image in-painting consists in reconstructing the missing parts of an image. By exploiting the structures such as sparsity over the wavelet coefficients, it is possible to in-paint images with a large portion of their pixels missing. In this part of the homework, we are going to study the convergence of different methods related to FISTA, as well as different restart strategies. We consider a subsampled image b = PΩx, where PΩ ∈ R n×p is an operator that selects only few, n p := m 2 , pixels from the vectorized image x ∈ R p . We can try to reconstruct the original image x by solving the following problems. min α∈Rp 1 2 kb − PΩWTαk 2 2 | {z } f`1 (α) + λ`1 kαk1 | {z } g`1 (α) , (11) min x∈Rp (1/2)kb − PΩxk 2 2 | {z } fTV(x) + λTVkxkTV | {z } gTV(x) . (12) Here, the operator WT is vectorized to be dimension compatible with PΩ. An example of image in-painting with this model is given in Figure 3. EXERCISE 2.1: (5 POINTS) – a) Write the gradient of f(α) in (11) and f(x) in (12). b) Find the Lipschitz constant of ∇α f(α) for (11) and ∇x f(x) in (12) analytically. Figure 3: An example of image in-painting with the three models described in this homework. The regularization parameter has been set to 0.01 for all methods. EXERCISE 2.2: (10 POINTS) – Adapt the FISTA algorithm from Exercise II.1 to solve the problems (11) and (12). a) Take a 256 × 256 natural image or better, a picture of you, and subsample 40% of its pixels uniformly at random (i.e. remove 60% of the pixels). b) Perform a parameter sweep over λ1 and λTV. You should choose a meaningful range for the parameters, one that allows you to observe the behavior of the PSNR as a function of the regularization parameters. Use 200 iterations for your algorithm. c) Plot the obtained PSNR against the regularization parameter values. d) Provide your codes with comments, and include the reconstructed images. EXERCISE 2.3: (10 POINTS) – You will now study the convergence to x ∗ , the optimal solution of problem (11). We use use λ = 0.01 and again uniformly subsample 40% of the pixels. a) Adapt (or reuse) the ISTA variations you implemented in Exercise II.1: • ISTA • FISTA • FISTA with gradient scheme restart (see Exercise 1.II.1) 7 LIONS @ EPFL Prof. Volkan Cevher Figure 4: Illustration of the unrolled proximal method. The function g(·) represents the gradient step x k − αk∇f(x k ), y represents the measurements b, while the function Pψ(·) (corresponding to gθ in text) is the learned proximal step. The values xi are then the results of the different iterations. Taken from [6]. b) The optimal solution x ∗ of problem (11) and the corresponding optimal value F ∗ := f(x ∗ )+g(x ∗ ) are not known a priori. You need to estimate them by running FISTA with gradient scheme restart for 5000 iterations and use the relative error rerrk < 10−15 () as the stopping criterion. The error at iteration k is defined as errk = ky k+1 − y k k2, rerrk := errk err0 . Write the value that you obtained for F ∗ in the report. c) Study the convergence of ISTA, FISTA, FISTA with gradient scheme restart over 2000 iterations. Adapt the tolerance criterion to stop iterating when |F(x k )−F ∗ | F∗ < 10−15. Plot a graph log |F(x k )−F ∗ | F∗ against k (cf. the examples in Lecture 10) and comment on the different rates and behaviors observed. HINT: You can use the function semilogy for the plot. d) Replace then the F ∗ with F := F(x ), where x is your ground-truth image. Plot a graph log |F(x k ) − F |/F against k using the same algorithms as in 3. for 1000 iterations. Comment on the differences. e) Provide your codes with comments, as well as both convergence plots with discussion on the results. Comparison with algorithm unrolling. In recent years, the interest has grown for deep learning method that learn to imitate and improve onto algorithms like ISTA of FISTA for composite convex minimization. Given un update rule for a composite minimization problem of form of (12), i.e. x k+1 = proxg(x k − αk∇f(x k )) , the approach of [6] proposes to learn the proximal operator with a neural network. They fix the number of iterations that they will use to T and alternate between the inner updates (x k − αk∇f(x k )) and the proximal operator parametrized by a neural network (we will call it gθ), as shown on figure 4. On Figure 4, one starts from the undersampled image x0 and iterate T times according to the rule x k+1 = proxgθ (x k −αk∇f(x k )). The network gθ is then trained to minimize Ex [kxT −x k], i.e. to minimize the average `2 distance between natural images x and reconstructed images xT , starting from the measured data x0 = b = PΩx . Their result show a drastic improvement upon some classically used methods such as Wavelet minimization (Problem (11)). We have implemented and pretrained the method for inpainting images of size 256 × 256 and we would like you to explore compare your implementations to the method. An example on how to use it is provided in the script example_unrolled_nn.py, which implements T = 5 unrolled step with an operator parametrized by a simple residual network. EXERCISE 2.4: (5 POINTS) – Compare the reconstruction performance and speed of reconstruction for FISTA with gradient scheme restart for `1 and TV minimization. a) Compare the performance after 500 reconstruction steps using `1 and TV minimization against the unrolled method and report the PSNR and reconstruction time, as well as the images reconstructed and error maps (|x − xˆ|). b) Compare the performance after T = 5 iterations of all three methods, and report the PSNR and reconstruction time as well as the images reconstructed and error maps (|x − xˆ|). Comment on your results. 8 LIONS @ EPFL Prof. Volkan Cevher c) Comment on the results. Which image appears the best to you, and why? Identify a tradeoff between the unrolled method and the classical iterative approaches. N.B. If your image yields poor reconstruction results with the unrolled CNN, try running it with of the provided images, such as gandalf.jpg. 3 Guidelines for the preparation and the submission of the homework Work on your own. Do not copy or distribute your codes to other students in the class. Do not reuse any other code related to this homework. Here are few warnings and suggestions for you to prepare and submit your homework. • This homework is due at 4:00PM, 6 December, 2019. • Submit your work before the due date. Late submissions are not allowed and you will get 0 point from this homework if you submit it after the deadline. • Questions of 0 point are for self study. You do not need to answer them in the report. • Your final report should include detailed answers and it needs to be submitted in PDF format. • The PDF file can be a scan or a photo. Make sure that it is eligible. • The results of your simulations should be presented in the final report with clear explanation and comparison evaluation. • We provide some Python scripts that you can use to implement the algorithms, but you can implement them from scratch using any other convenient programming tool (in this case, you should also write the codes to time your algorithm and to evaluate their efficiency by plotting necessary graphs). • Even if you use the Python scripts that we provide, you are responsible for the entire code you submit. Apart from completing the missing parts in the scripts, you might need to change some written parts and parameters as well, depending on your implementations. • The codes should be well-documented and should work properly. Make sure that your code runs without error. If the code you submit does not run, you will not be able to get any credits from the related exercises. • Compress your codes and your final report into a single ZIP file, name it as ee556hw3_NameSurname.zip, and submit it through the moodle page of the course. References [1] CHAMBOLLE, A. An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20 (2004), 89–97. [2] GISELSSON, P., AND BOYD, S. Monotonicity and restart in fast gradient methods. In Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on (2014), IEEE, pp. 5058–5063. [3] JORDAN, M. I., ET AL. Why the logistic function? a tutorial discussion on probabilities and neural networks, 1995. [4] LECUN, Y., AND CORTES, C. MNIST handwritten digit database. [5] MALLAT, S. A wavelet tour of signal processing. Academic press, 1999. [6] MARDANI, M., SUN, Q., DONOHO, D., PAPYAN, V., MONAJEMI, H., VASANAWALA, S., AND PAULY, J. Neural proximal gradient descent for compressive imaging. In Advances in Neural Information Processing Systems (2018), pp. 9573–9583. [7] O’DONOGHUE, B., AND CANDES, E. Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics 15, 3 (2015), 715–732. [8] RUDIN, L., OSHER, S., AND FATEMI, E. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60, 1-4 (1992), 259–268. [9] SCHACKE, K. On the kronecker product. APPENDIX 9 LIONS @ EPFL Prof. Volkan Cevher A Deriving the Hessian of multinomial logistic function Differentiating expression (1) a second time with respect to X gives us a 3D tensor as the Hessian, which is cumbersome to work with. Luckily, we can avoid this by differentiating f with respect to a vectorized version of X, thus imposing a regular matrix form on ∇ 2 f . Our Hessian will now belong to R dC×dC. The vectorized X and ∇Xvec f corresponding to it are given below: Xvec = x11 . . . xd1 . . . x1C . . . xdC ∈ R dc ∇Xvec f(X) = Xn i=1 1 PC k=1 e a T i xk ai1e a T i x1 . . . aide a T i x1 . . . ai1e a T i xC . . . aide a T i xC − ai1 11(bi) . . . aid 11(bi) . . . ai1 1C(bi) . . . aid 1C(bi) = Xn i=1 e a T i x1 PC k=1 e a T i xk . . . e a T i xC PC k=1 e a T i xk ⊗ ai1 . . . aid − yi ⊗ ai1 . . . aid = Xn i=1 e a T i x1 PC k=1 e a T i xk . . . e a T i xC PC k=1 e a T i xk ⊗ ai − yi ⊗ ai , (13) where 1j(bi) = 1, if bi = j 0, otherwise is the indicator function. We can now differentiate ∇Xvec f(X) with respect to Xvec to obtain the Hessian. Note that the second term within the sum does not depend on Xvec and will therefore not appear in ∇ 2 Xvec f(X). We make a further simplification and only consider one data point, as the final expression will be the summation over the data. To ease notation, we drop the i-index from ai and denote as σj := e a T x j PC k=1 e a T xk . Finally, we observe that ∂σj ∂xl j = alσj(1 − σj) and ∂σj ∂xlp = −alσjσp. Using the new conventions and differentiating a second time with respect to Xvec, we have: ∂ e a T x1 PC k=1 e a T xk . . . e a T xC PC k=1 e a T xk ⊗ a ∂Xvec = ∂ σ1 . . . σC ⊗ a ∂Xvec = ∂ σ1 . . . σC ∂Xvec ⊗ a (14) We address the first term of the Kronecker product separately: ∂ σ1 . . . σC ∂Xvec = a1σ1(1 − σ1) a2σ1(1 − σ1) . . . adσ1(1 − σ1) −a1σ1σ2 −a2σ1σ2 . . . − adσ1σ2 . . . − adσ1σC −a1σ2σ1 −a2σ2σ1 . . . − adσ2σ1 a1σ2(1 − σ2) a2σ2(1 − σ2) . . . adσ2(1 − σ2) . . . − adσ2σC . . . . . . . . . . . . . . . . . . . . . −a1σCσ1 −a2σCσ1 . . . − adσCσ1 −a1σCσ2 −a2σCσ2 . . . − adσCσ2 . . . adσC(1 − σC) (15) = σ1(1 − σ1) −σ1σ2 −σ1σ3 . . . −σ1σC −σ2σ1 σ2(1 − σ2) −σ2σ3 . . . −σ2σC . . . . . . . . . . . . . . . −σCσ1 −σCσ2 −σCσ3 . . . σC(1 − σC) | {z } Σ ⊗ a T (16) Finally we have: ∂ σ1 . . . σC ⊗ a ∂Xvec = Σ ⊗ a T ⊗ a (17) = Σ ⊗ a T ⊗ a (18) = Σ ⊗ aaT (19) (20) A simple summation over the data retrieves the expression of the Hessian given in (5). 10
This homework covers the lectures 3-6 in which we consider binary classification by the linear support vector machine (SVM), and you will compute the SVM classifier by the gradient descent and accelerated gradient methods. You will also explore how algorithmic enhancements, such as backtracking linesearch and adaptive restart strategies, improve the numerical efficiency. In the second part, you will implement the Newton and quasi-Newton methods to compute the SVM classifier. Finally, you will implement three versions of stochastic gradient descent methods. 1 Problem statement The support vector machine is one of the most popular approaches to linear binary classification. Given a training data set of n points of the form (a1, b1), . . . ,(an, bn) where ai ∈ R p and bi ∈ {−1, +1} indicates the class to which ai belongs. The goal of linear binary classification is to learn a hyperplane in R p , based on the training data set, which separates the points with label +1 and those with label −1. Notice that this goal cannot be achieved perfectly if the labels in the training data set are noisy, e.g., each of the label is incorrect with probability ε ∈ (0, 1/2) independently, and the ratio of incorrectly classified points in another test data set becomes an important performance measure. The empirical risk minimization (ERM) principle in Lecture 2 suggests that we can consider the hyperplane that minimizes the empirical 0 − 1 loss on the training set. That is, we can compute (x ? , µ) ∈ R p × R defined by (x ? , µ? ) ∈ arg min (x,µ)∈Rp×R `0−1(x, µ) = 1 n Xn i=1 1{bi (a T i x+µ)≤0} (x, µ) , (1) where 1{bi (a T i x+µ)≤0} (x, µ) = ( 1 if bi(a T i x + µ) ≤ 0, 0 otherwise. Then {y : y T x ? + µ ? = 0, y ∈ R p } is a separating hyperplance. We note that `0−1 is not convex. To overcome the computational issue, `0−1 is usually replaced by an approximation called the Hinge loss: `H(x, µ) = 1 n Xn i=1 max 1 − bi(a T i x + µ), 0 , and (1) could be approximated by the Hinge loss minimization problem, plus a regularization term, λ 2 kxk 2 . This formulation is known as linear support vector machine. Note that `H is not continuously differentiable, thus we want to consider a slightly different loss function. One possible replacement is the smoothed Hinge loss: `SH(x, µ) = 1 n Xn i=1 gi(x, µ) such that gi(x, µ) = 1 2 − bi(a T i x + µ), bi(a T i x + µ) ≤ 0 1 2 (1 − bi(a T i x + µ))2 , 0 < bi(a T i x + µ) ≤ 1 0, 1 ≤ bi(a T i x + µ). Finally, define the objective function f as f(x, µ) := `SH(x, µ) + λ 2 kxk 2 . (2) LIONS @ EPFL Prof. Volkan Cevher In this homework, we will learn a separating homogeneous half-space by smoothed Hinge loss minimization. Specifically, we will consider the case when µ = 0, solve the following SVM problem with smoothed Hinge loss x ? ∈ arg min x f(x, 0), (3) and use {y : y T x ? = 0, y ∈ R p } as the separating hyperplane. We will write f(x) for f(x, 0) for simplicity. Note: without any more explications, matrix norm in the following is understood as Frobenius norm, 0 and 1 represent zeros and ones vectors or matrices. PROBLEM 1: (10 POINTS) – GEOMETRIC PROPERTIES OF THE OBJECTIVE FUNCTION f Given x ∈ R p , we denote by IL, IQ and I0 n×n matrices such that IL(i, i) = 1 if bi(a T i x+µ) ≤ 0, IQ(i, i) = 1 if 0 < bi(a T i x+µ) ≤ 1, I0(i, i) = 1 if bi(a T i x + µ) ≥ 1 and 0 otherwise. Set A := [a1, · · · , an] T and b := [b1, · · · , bn] T . For convenience, we also set A˜ := [b1a1, · · · , bnan] T . (a) (5 points) Do necessary calculations to show that ∇f(x) = λx + 1 n A˜T IQ[A˜ x − 1] − 1 n A˜T IL1 and then explain that ∇f is L-Lipschitz continuous with L = λ + 1 n kA T k · kAk. Hint: Observe that L = 0 for the linear part, i.e. when bi(a T i x+µ) ≤ 0. Hence, it is sufficient to compute L for the quadratic region, i.e. 0 < bi(a T i x + µ) ≤ 1, by assuming that all the samples ai lie in that region. Use the fact that kAk = kA˜ k, kA T k = kA˜ T k, and λx + 1 n A˜T IQ[A˜ x − 1] = λx + 1 n A˜ T [A˜ x − 1] (b) (3 points) Suppose that IQ = I where I is the identity matrix. Explain why f is twice-differentiable at x and do necessary calculations to show that ∇ 2 f(x) = λI + 1 n A TA. (4) Note that f is not twice differentiable at x if IQ = I does not hold. However, we can still use the following ∇ 2 f(x) = λI + 1 nA TA, if f is twice-differentiable at x, λI, otherwise. as Hessian in Newton method. (c) (2 points) Show that f is λ-strongly convex. 2 Numerical methods for linear support vector machine Our aim in the remainder of this homework is to implement different optimization algorithms for solving the regularized support vector machine problem (3). We will use the breast-cancer dataset from the UCI Machine Learning Repository [?] consisting of n = 546 samples, each with p = 10 features. In all experiments, λ = 0.0001. You can write the codes either in Matlab or Python. Here are the descriptions of the codes: Matlab Python Description compute_error.m commons.compute_error compute errors Oracles.m commons.Oracles return function values, gradients, stochastic gradients GD.m algorithms.GD Gradient Descent GDstr.m algorithms.GDstr Gradient Descent (strongly convex) AGD.m algorithms.AGD Accelerated Gradient Descent AGDstr.m algorithms.AGDstr Accelerated Gradient Descent (strongly convex) LSGD.m algorithms.LSGD Gradient Descent with line search LSAGD.m algorithms.LSAGD Accelerated Gradient Descent with line search AGDR.m algorithms.AGDR Accelerated Gradient Descent with restart LSAGDR.m algorithms.LSAGDR Accelerated Gradient Descent (line search + restart) AdaGrad.m algorithms.AdaGrad Adaptive Gradient Method ADAM.m algorithms.ADAM Adaptive Moment estimation algorithm SGD.m algorithms.SGD Stochastic Gradient Descent SAG.m algorithms.SAG Stochastic Averaging Gradient SVR.m algorithms.SVR Stochastic Gradient Descent (Variance Reduction) SVM.m SVM.py Main code 2 LIONS @ EPFL Prof. Volkan Cevher General guides: • Your are required to complete the missing parts in the codes, indicated by markers YOUR CODES HERE. • The scripts SVM.m (Matlab) and SVM.py (Python) are to test your results. For instance, change GD_option = 1 in Python (or GD = 1 in Matlab) if you want to test GD and similarly to the other algorithms. By default these values are 0. • Put the resulting figures and record the resulting ratios of incorrectly classified points in the test data set in your report. • For Python coders: in provided codes, an 1D- vector of length m in Matlab is understood to be an array of size m × 1 while it is understood to be a Numpy array of shape (m, ). If you choose to code in Python, you might have to have some reshape operations when you work with Quasi-Newton method. 2.1 First order methods for linear support vector machine The optimality condition of (3) is ∇f(x ? ) = λx ? + 1 n A˜T IQ[A˜ x ? − 1] − 1 n A˜T IL1 = 0. (5) Condition (5) is in fact the necessary and sufficient condition for x ? to be optimal for (3). We can equivalently write this condition as x ? = x ? − B∇f(x ? ) (6) for any symmetric, positive definite matrix B. This is a fixed-point formulation of (3), which will be used to develop gradient and Newton-type methods. PROBLEM 2: (65 POINTS) – FIRST ORDER METHODS FOR SVM Notice that choosing B = αI with α > 0 in the formulation (6) suggests using the gradient descent method to solve (3). In this problem, you will implement various variants of gradient descent algorithm. We have defined 3 input arguments (see the documentations in Oracles.m and commons.Oracles to know how to use them in your codes): 1. fx : The function that characterizes the objective to be minimized; 2. gradf : The function that characterizes the gradient of the objective function fx; 3. parameter: The structure (Matlab) or the dictionary (Python) that includes the fields maxit (number of iterations), x0 (initial estimate), strcnvx (strongly convex constant) and Lips (Lipschitz constant). (a) (5 points) The main ingredients of the gradient descent algorithm are the descent direction (or search direction) d k and step-size αk . In this part, we consider the gradient descent algorithm with constant step size 1/L, i.e., αk = 1/L for any k = 0, 1, … . Recall the L computed from Problem 1.(a). Implement this algorithm by completing GD.m or algorithms.GD. Note that the objective function is strongly convex. Therefore, we can select the constant step-size α as 2/(L + λ) to get a faster convergence rate. Implement this variant of the gradient descent method, by completing the code in GDstr.m or algorithms. GDstr. (b) (10 points) We can accelerate the above gradient descent algorithm as follows (t0 = 1): x k+1 := y k − αk∇f(y k ), tk+1 := 1 2 (1 + q 1 + 4t 2 k ) y k+1 := x k+1 + tk−1 tk+1x k+1 − x k . where y0 = x0. Details of this algorithm can be found in Lecture 4. Implement this algorithm with constant step size α = 1/L, by completing the missing parts in AGD.m or function AGD in algorithms.AGD. Note that the objective function is strongly convex. Therefore, we can use the accelerated gradient algorithm for strongly convex objectives to get faster convergence. This variant can be summarized as follows: x k+1 := y k − αk∇f(y k ), y k+1 := x k+1 + √ L− √ λ √ L+ √ λx k+1 − x k . Implement this variant of the accelerated gradient method by completing the code in AGDstr.m or algorithms.AGDstr. 3 LIONS @ EPFL Prof. Volkan Cevher (c) (15 points) We can obtain better performance by considering a line search procedure, which adapts the step-size αk to the local geometry. The line-search strategy to determine the stepsize αk for the standard GD x k+1 = x k − αk∇f(x k ). is the follow: At the step k, let x k be the current iteration and d k = −∇f(x k ) be a given descent direction, and perform: • Set L0 = L. • At each iteration, set Lk,0 = 1 2 Lk−1, where k is the iteration counter. • Using a for loop, find the minimum integer i ≥ 0 that satisfies f x k + 1 2 iLk,0 d k ≤ f(x k ) − 1 2 i+1Lk,0 kd k k 2 . • Set Lk = 2 iLk,0 and use the step-size αk := 1 Lk (i.e., use the new estimate that you have used in the line-search: x k + 1 2 iLk,0 d k ). Complete the missing parts in the files LSGD.m or algorithms.LSGD in order to implement gradient descent with line-search. We now incorporate a line-search with accelerated gradient method in (b) as follow: at step k, one has the current iterations x k together with intermediate variable y k and its corresponding direction d k = −∇f(y k ). Note that the intermediate variable is then used in the gradient step and hence the line-search will be performed on it to determine the stepsize. • Perform a line-search strategy as above with respect to y k and direction d k to determine the step-size αk as follows: – Set L0 = L. – At each iteration, set Lk,0 = 1 2 Lk−1, where k is the iteration counter. – Using a for loop, find the minimum integer i ≥ 0 that satisfies f y k + 1 2 iLk,0 d k ≤ f(y k ) − 1 2 i+1Lk,0 kd k k 2 . – Set Lk = 2 iLk,0 and use the step-size αk := 1 Lk . • Update the next iterations: x k+1 := y k − αk∇f(y k ), tk+1 := 1 2 (1 + q 1 + 4 Lk Lk−1 t 2 k ) y k+1 := x k+1 + tk−1 tk+1x k+1 − x k . Complete the missing parts in the files LSAGD.m or algorithms.LSAGD in order to implement accelerated gradient descent with line-search. (d) (15 points) The accelerated gradient method is non-monotonic, so it can be oscillatory, i.e. f(x k+1 ) f(x k ) for all k ≥ 0. To prevent such behavior, we can use the so-called adaptive restart strategy. Briefly, one such strategy can be explained as follows: At each iteration, whenever x k+1 is computed, we evaluate f(x k+1 ) and compare it with f(x k ): • If f(x k ) < f(x k+1 ), restart the iteration, i.e., recompute x k+1 by setting y k := x k and tk := 1; • Otherwise, let the algorithm continue. This strategy requires the evaluation of the function value at each iteration, which increases the computational complexity of the overall algorithm. Implement the adaptive restart strategy which uses the function values for the accelerated gradient algorithm with constant step-size αk = 1/L by completing the files AGDR.m or function AGDR in algorithms.AGDR. Incorporate the line-search, acceleration and function values restart by completing LSAGDR.m or function LSAGDR in algorithms. LSAGDR. Hint: Note that while the restart is executed on x k , the line-search strategy is executed on y k and the direction d k = −∇f(y k ) to determine the stepsize and hence, use line-search each time you encounter an intermediate variable y k . (e) (10 points) We can also apply an optimization techniques that does not exploit the knowledge of the Lipschitz constant, and instead adapts to the local geometry by making use of past gradient information. AdaGrad adapts the step size using the inverse square-norm of past gradients. Starting with Q0 = 0 it iterates as follows: Qk = Qk−1 + k∇f(x k )k 2 Hk = ( √ Qk + δ)I xk+1 = xt − αH −1 k ∇f(x k ) 4 LIONS @ EPFL Prof. Volkan Cevher Complete the missing parts in the files AdaGrad.m or algorithms.AdaGrad in order to implement the above adaptive gradient method using α = 1, δ = 10−5 . (f) (10 points) Another famous adaptive optimization method is called the ADAptive Moment estimation algorithm, a.k.a ADAM. gk = ∇f(xk) mk = β1mk−1 + (1 − β1)gk ← Momentum vk = β2vk−1 + (1 − β2)g 2 k ← Adaptive term mˆk = mk/(1 − β k 1 ) vˆk = vk/(1 − β k 2 ) ← Scaling for removing bias Hk = √ vˆk + xk+1 = xk − αmˆk/Hk Note that all operations shown above, when applied to vectors, are applied component-wise. In particular, g 2 k is a vector of the same size as gk where all elements are squared. Complete the missing parts in the files ADAM.m or algorithms.ADAM in order to implement the above adaptive gradient method using α = 0.1, β1 = 0.9, β2 = 0.999 and = 10−8 . It has been shown that ADAM can fail to converge to the global minimum of a convex problem [1]. The authors provided a variant of ADAM, called AMSgrad in order to fix this convergence issue. However, in practice it is not clear which method performs best. (You are not required to implement this method, but advised to have a look at it for personal interest). 2.2 Stochastic gradient method for SVM In this problem, you will implement three versions of stochastic gradient descent method to solve SVM. We have defined 4 input arguments (see the documentations in Oracles.m and commons.Oracles to know how to use them in your codes): 1. fx : The function that characterizes the objective to be minimized; 2. gradf : The function that characterizes the gradient of the objective function fx; 2. gradfsto : The function that characterizes the stochastic gradient of the objective function fx; 4. parameter: The structure (Matlab) or the dictionary (Python) that includes the fields maxit (number of iterations), x0 (initial estimate), strcnvx (strongly convex constant), Lmax (see definition below) and no0functions (number of functions). PROBLEM 4: (25 POINTS) – STOCHASTIC GRADIENT METHODS FOR SVM To use the stochastic gradient descent, we recast (3) as follow f(x) = 1 n Xn i=1 gi(x, 0) + λ 2 kxk 2 | {z } fi (x) . (SVM) Let 1{predicate} = 1 if the predicate is true, 0 otherwise. Similarly as in Problem 1, we can deduce that ∇fi(x) = λx + 1{0
In this VHDL assignment, you will describe a sequence detector in VHDL and test it on the Altera DE1-SoC board using a sequence from a memory input (ROM). If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). • Refer to the DE1-SoC User Manual (in the Content tab on myCourses). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax.4 VHDL Description of Storage Elements In the previous VHDL assignments, you have learned how to use sequential statements to describe the behavior of combinational circuits inside the process block. Sequential statements within the process block can also be used to describe sequential circuits such as storage elements. In digital systems, we have two types of memory elements: latches and flip-flops (FFs). Latches are memory elements that immediately reflect any changes in the input signal to the output signal when the level of the control signal (clk) is high. As such, latches are usually referred to as level-triggered memory elements. Alternatively, FFs change their state when the control signal goes from either high to low or from low to high. Note that FFs working with a control signal that goes from high to low or from low to high are called, respectively, negative-edge-triggered and positive-edge-triggered FFs. In digital systems, edge-triggered flip-flops are superior to level-triggered latches since FFs are more robust. For example, noise can easily disrupt the output of a latch when the control signal is high. On the other hand, the output of FFs can be disrupted only in presence of noise at the edge transition of the control signal. It is highly recommended, therefore, to use FFs when designing sequential circuits. In VHDL, process blocks are used to describe FFs. Since FFs set their state at the edge of the control signal, we need a statement to detect the edge transition of the control signal. This can be simply done by using a IF-THEN-ELSE statement. Assuming that clk is the control signal of a FF, a positive edge transition (i.e., a transition from ’0’ to ’1’) of the clk signal can be detected by the following statement: RISING_EDGE (clk). Similarly, a negative edge transition (i.e., a transition from ’1’ to ’0’) can be detected by the following statement FALLING_EDGE (clk). For example, the following process block describes a positive-edge-triggered DFF. PROCESS ( clk ) BEGIN IF RISING_EDGE( clock ) THEN Q
In this VHDL assignment, you will describe a 4-bit comparator circuit in VHDL and test it on the Altera DE1-SoC board using LEDs as outputs and sliding switches as inputs. You will also learn how to describe storage elements and sequential logic circuits in VHDL. Then, you will design a 3-bit up-counter and simulate the counter using ModelSim. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). • Refer to the DE1-SoC User Manual (in the Content tab on myCourses). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax.4 Sequential Assignment Statements In the previous VHDL assignments, you have learned several types of assignment statements such as simple assignment statements, selected assignment statements, and conditional assignment statements. All of these statements have the property that the order in which they appear in the VHDL code does not affect the meaning of the code, that is, it does not affect the synthesized circuit. For this reason, these statements are usually referred to as concurrent assignment statements. VHDL also has a second category of statements, called sequential assignment statements, for which the order of the statements may affect the meaning of the code. In the following, we briefly discuss this new kind of statements. They are described in more detail in Sections 6.6.6 and 6.6.7 of the textbook. Please read these sections before you attempt this assignment. Even more details can be found in Appendix A.9 of the textbook. The two main types of sequential assignment statements are: if-then-else statements and case statements. These sequential assignment statements must be placed inside a block, called a process block. The PROCESS block starts with the keyword “PROCESS”. It has an optional label and a sensitivity list. Following the PROCESS keyword is the statement “BEGIN”. Any statement between “BEGIN” and the “END PROCESS label;” are sequential statements. l a b e l : PROCESS ( sensitivity list ) BEGIN — sequential statements END PROCESS l a b e l ; The sensitivity list contains signals. Unlike concurrent statements which are executed all the time, the process block is activated only when one of the signals in its sensitivity list changes its value. Once activated, the statements inside the process block are executed sequentially in the order they appear in the block. There are two things to note: (i) Any assignments made to signals inside the process are not visible outside the process until all of the statements in the process have been evaluated; (ii) in case of multiple assignments to the same signal, only the last one determines the final value of the signal. Also note that VHDL allows multiple processes to be described within the same architecture. 4.1 IF-THEN-ELSE Statements IF-THEN-ELSE statements are used to modify the behavior of your function depending on whether one or more conditions hold. The syntax of IF-THEN-ELSE statements is shown below. Note that the “END IF” must be separated by a space. IF condition THEN — sequential statements ELSE — sequential statements END IF; You may have nested IF and ELSE as follows: IF condition THEN — sequential statements ELSIF condition THEN — sequential statements ELSIF condition THEN — sequential statements ELSE — sequential statements END IF; In this structure only one of the branches is executed depending on the condition. Even if there are several conditions which are true in this structure only the first TRUE condition will be followed. After the execution of the sequential statements within the first true condition the statements after the END IF will be executed next. Therefore it is very important to write the order of IF blocks and ELSIF blocks according to your desired behavior.4.2 CASE Statements CASE statements consider all of the possible values that an object can take and execute a different branch depending on the current value of the object as shown next. CASE object IS WHEN value1 => — statements WHEN value2 => — statements WHEN value3 => — statements — etc . WHEN OTHERS => — statements END CASE; Note that the CASE statement must include a WHEN clause for each of the possible values of object. This necessitates a WHEN OTHERS clause if some of possible values of object are not covered by WHEN clauses. 5 4-Bit Comparator Comparators are a useful type of arithmetic circuits that compare the relative sizes of two binary numbers. In this assignment you will implement a 4-bit comparator circuit that takes two 4-bit unsigned inputs A and B, and determines which one of the cases A = (B + 1), A < (B + 1), A ≤ (B + 1), A > (B + 1), A ≥ (B + 1) holds. It should detect the occurrence of overflow when performing B + 1, i.e., detect if B + 1 requires more than 4 bits. Use the following entity declaration for your implementation of the comparator circuit. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_comparator i s P o r t ( A, B : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; AgtBplusOne : o u t s t d _ l o g i c ; AgteBplusOne : o u t s t d _ l o g i c ; AltBplusOne : o u t s t d _ l o g i c ; AlteBplusOne : o u t s t d _ l o g i c ; AeqBplusOne : o u t s t d _ l o g i c ; overflow : o u t s t d _ l o g i c ); end firstname_lastname_comparator ; Note that in case of overflow when performing B + 1, the comparator circuit outputs 1 for the overflow signal while the remaining signals (i.e., AgtBplusOne, AgteBplusOne, AltBplusOne, AlteBplusOne and AlteBplusOne) are set to 0. Otherwise, the circuits outputs proper values for AgtBplusOne, AgteBplusOne, AltBplusOne, AlteBplusOne and AeqBplusOne signals according to A > (B+1), A ≥ (B+1), A < (B+1), A ≤ (B+1), A = (B+1), respectively, while the overflow signal is set to 0. For example, if A = 910 and B = 510 then both AgtBplusOne, AgteBplusOne should be 1, while the rest (including overflow) should be 0. The firstname_lastname in the name of the entity is the name of one of the students in your group. To describe your comparator in VHDL, use sequential statements in a single process block. Once you have described your circuit in VHDL, you should test your design on the FPGA board. Similar to the VHDL Assignment #4, the inputs of the comparator circuit are provided using the slider switches; you will need to use eight slider switches. To visualize the output signals of the comparator circuit, use the LEDs located right above the slider switches on the FPGA board. Note that you will need to use six LEDs (one for each output signal). When an output signal is set to ’1’, the corresponding LED turns on; otherwise, it will be off. Afterwards, perform the pin assignments and program the FPGA by following the instructions provided in VHDL Assignement #4. Note that the pin locations of LEDs are different from those of 7-segment LEDs. The pin assignments for the LEDs are given in Figure 3.17 on p. 25 of the Altera board manual. Once completed, test the functionality of your comparator circuit for different input values using the LEDs and slider switches.6 Questions 1. Briefly explain your VHDL code implementation of all circuits. 2. Given that A = 510, provide a separate simulation plot that demonstrates all possible cases for the 4-bit comparator, including a separate plot for the case where overflow occurs. A total of four plots should be included. Explain each plot and mark all inputs and outputs clearly. 3. Perform timing analysis (slow 1,100 mV model) of the 4-bit comparator and find the critical path(s) of the circuit. What is the delay of the critical path(s)? 4. Report the number of pins and logic modules used to fit your 4-bit comparator design on the FPGA board. 7 Deliverables You are required to submit the following deliverables on MyCourses. Please note that a single submission is required per group (by one of the group members). • Lab report. The report should include the following parts: (1) Names and McGill IDs of group members, (2) an executive summary (short description of what you have done in this VHDL assignment), (3) answers to all questions in previous section (if applicable), (4) legible figures (screenshots) of schematics and simulation results, where all inputs, outputs, signals, and axes are marked and visible, (5) an explanation of the results obtained in the assignments (mark important points on the simulation plots), and (6) conclusions. Note – students are encouraged to take the reports seriously, points will be deducted for sloppy submissions. Please also note that even if some of the waveforms may look the same, you still need to include them separately in the report. • Project files. Create a single .zip file named VHDL#_firstname_lastname (replace # with the number of the current VHDL assignment and firstname_lastname with the name of the submitting group member). The .zip file should include the working directory of the project.
In this assignment, you will use a previously designed BCD adder circuit, you will implement and test the circuit on the Altera DE1-SoC board. You will learn how to work with the Altera DE1-SoC board, use switches, and the 7-segment LED display. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 Critical Path of Digital Circuits In this part, you will learn how to use the Quartus CAD tool to determine the delay of a given path in digital circuits. To this end, in this section, we use the ripple-carry adder circuit (that you designed in VHDL assignment #3) as the “circuit under examination”.2 Follow the instructions described in VHDL Assignment #1 to create a project. Make sure to select the Cyclone V family of FPGAs, with the following part number: 5CSEMA5F31C6 when creating a project. Once created, import the VHDL description of your digital circuit into the project and compile it to make sure there are no syntax errors in your design. The critical path is the longest path in the circuit and limits the speed of the circuit speed. The speed of a digital circuit is measured in terms of latency and throughput. Latency is the time needed for the circuit to produce an output for a given input (i.e., the total propagation delay (time) from the input to the output), and it is expressed units of time. Alternatively, throughput refers to the rate at which data can be processed. In this assignment, we only consider the latency as a metric to measure the speed of the circuit. In general, digital circuits are subject to timing constraints dictated by the target application. Whether a circuit meets these timing constraints can only be known after the circuit is synthesized. After synthesis is performed, the designer can analyze the circuit to determine whether timing constraints were satisfied using the term slack. Slack is the margin by which a timing requirement is met or not met; it is the difference between the required arrival time and the actual arrival time. A positive slack value indicates the margin by which a requirement was met. A negative slack value indicates the margin by which a requirement was not met. To insert timing constraints in Quartus, select “Synopsys Design Constraints File” from the “File–>New” menu. The maximum delay can be specified in the Synopsys Design Constraints File using the following command: set_max_delay -from [get_ports ] -to [get_ports ] For example, we can specify a maximum delay of 12 ns for all the possible paths from the inputs of the ripplecarry adder to its outputs as shown below. Once the timing constraints are inserted, save the file with the name “firstname_lastname_sdc.sdc”. Recompile your design by double clicking on “Timing Analysis” in the Tasks window of Quartus. Before recompilation, make sure that the .sdc file is added to the project. 3 The Timing Analyzer will read the .sdc file and use the constraint information when performing timing analysis. Once a green check mark appears next to “Timing Analysis”, double click on “Timing Analyzer” under “Timing Analysis” to open the Timing Analyzer tool. In the Tasks window of the Timing Analyzer tool, double click on “Create Timing Netlist”, “Read SDC File” and “Update Timing Netlist” icons, respectively. Once completed, the colors of the icons will become green. Before measuring the delay of your design, you should specify the operating conditions of the FPGA device. This can be simply set by selecting one of the possible operating conditions listed in the “Set Operating Conditions” in the Timing Analyzer tool. To report the delay of different paths from inputs to outputs, select “Report Timing . . . ” from the “Reports –> Custom Reports” menu. 4 Since no clock signal is associated with your design, we only specify the beginning of the target path(s) by clicking on “From” and the end of the target path/paths by clicking on “To” under the section labeled as “Targets”. In the “Name Finder” window that pops up, click on “List” to list all the I/O signals of your design. Select (double click on) a signal (or signals) to determine the beginning of the path that you want to examine and click “OK”. Repeat the same procedure to determine the end of the path. As an example, we can examine the path from the LSB of the input A (i.e., A(0)) to the LSB of the output S (i.e., S(0)) as shown below. 5 Now, click on “Report Timing” to obtain timing information for the specified path. The delay of the specified path is denoted “Data Delay” in the section entitled “Summary of Paths”. The positive value of the slack denotes the difference between the delay of the path (i.e., Data Delay) and the timing constraint inserted in the .sdc file (i.e., 12 ns). This information is also visualized under the “waveform” tab. To find the critical path of your design, you should examine all possible paths from all inputs to all outputs and find the one with the longest delay. However, using this “exhaustive search” method is very time consuming as the number of I/O ports increases. To limit the number of paths under examination, we reduce the target delay value in the .sdc file so that a timing violation occurs. For instance, we reduce the target delay constraint of the “circuit under examination” from 12 ns to 5 ns and recompile the design by double clicking on “Timing Analysis” in Quartus. In case of a timing violation, Quartus determines the violating path(s) in the compilation summary of the “Timing Analyzer” tool. 6 In this approach, our search for the critical path will now be limited to the violating paths. Examining the violating paths in the “Timing Analyzer” tool will, therefore, determine the critical path. 5 BCD to 7-Segment LED Decoder A 7-segment LED display includes 7 individual LED segments, as shown below. By turning on different segments together, we can display characters or numbers. There are six of these displays on the DE1-SoC board, which you will use later to display the result of your full implementation of the adder. We will need a circuit to drive the 7-segment LEDs on the DE1 board, called 7-segment decoder. It takes as input a 4-bit BCD-formatted code representing the 10 digits between 0 and 9, and generates the appropriate 7-segment display associated with the input code. For many LED displays, including the ones on the DE1 board, segments turn on when their segment inputs are driven low, that is “0” means on and “1” means off. This scheme for the inputs is called “active low.” The VHDL code for the 7 segment decoder is provided below. 7 l i b r a r y i e e e ; u s e i e e e . s t d _l o gi c _ 1 1 6 4 . a l l ; u s e i e e e . n ume ri c_ s t d . a l l ; e n t i t y seven_segment_decoder i s p o r t ( code : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; segments_out : o u t s t d _ l o g i c _ v e c t o r (6 downto 0) ); end seven_segment_decoder ; a r c h i t e c t u r e decoder o f seven_segment_decoder i s b e g i n WITH code SELECT segments_out
In this assignment, you will build upon previously implemented circuits to design a more complex circuit. You will learn how to design and simulate useful adder circuit blocks. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 VHDL Implementation of a 4-bit circular barrel shifter In this part of the lab, you will design a circuit that implements an 4-bit circular barrel shifter. The circular barrel shifter is a circuit that can shift its input by a given number of bits with combinational logic. The 4-bit circular barrel shifter is usually implemented by stacking two layers of 2×1 multiplexers as shown below. All 4 multiplexer in the leftmost layer use sel(1) as the selector signal. Similarly, all 4 multiplexers in the rightmost layer use sel(0) as the selector signal. Make sure that you familiarize yourself with how the barrel shifter works by calculating its output for several input combinations by hand.You are required to implement the 4-bit barrel shifter in VHDL using both structural and behavioral styles. To obtain a structural description of the 4-bit barrel shifter, you are required to use the structural description of the 2-to-1 multiplexer. Write a structural VHDL description for the 4-bit circular barrel shifter by instantiating the structural description of the 2-to-1 multiplexer 8 times. Use the following entity declaration for your structural VHDL description of the 4-bit circular barrel shifter: l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_barrel_shifter_structural i s P o r t (X : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; sel : i n s t d _ l o g i c _ v e c t o r (1 downto 0) ; Y : o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ); end firstname_lastname_barrel_shifter_structural ; Once completed, you are required to implement the 4-bit circular barrel shifter using the behavioral style. One way to obtain a behavioral description of the 4-bit circular barrel shifter is the use of VHDL select statements. Write a behavioral VHDL code for the 4-bit circular barrel shifter using a single VHDL select statement only. Use the following entity declaration for your behavioral VHDL description of the 4-bit circular barrel shifter: l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_barrel_shifter_behavioral i s P o r t (X : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; sel : i n s t d _ l o g i c _ v e c t o r (1 downto 0) ; Y : o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ); end firstname_lastname_barrel_shifter_behavioral ; Hint: You may want to use the VHDL concatenate operator ampersand (&). It can be used to combine two or more signals together. Note that all input signals to the concatenation must be of the same type and the result of the concatenation needs to exactly fit the width of the concatenated input signals. Once you have your circuit described in VHDL using both structural and behavioral styles, write a testbench code and perform an exhaustive test for your VHDL descriptions of the 4-bit circular barrel shifter. 5 VHDL Description of Adder Circuits In this section, you will be asked to perform the design and simulation of the following two adder circuits: (a) a 4-bit ripple-carry adder; and (b) a one-digit binary coded decimal (BCD) adder. Details of the assignments are described below. 5.1 Ripple-Carry Adder (RCA) In this section, you will implement a structural description of a 4-bit ripple-carry adder using basic addition components: half-adders and full-adders. 5.1.1 Structural Description of a Half-Adder in VHDL A half-adder is a circuit that takes two binary digits as inputs, and produces the result of the addition of the two bits in the form of a sum and carry signals. The carry signal represents an overflow into the next digit of a multi-digit addition. Using the following entity definition for your VHDL code, implement a structural description of the half-adder. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y half_adder i s p o r t (a: i n s t d _ l o g i c ; b: i n s t d _ l o g i c ; s: o u t s t d _ l o g i c ; c: o u t s t d _ l o g i c ); end half_adder ; After you have described your structural style of the half-adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test of your VHDL description of the half-adder. 5.1.2 Structural Description of a Full-Adder in VHDL Unlike the half-adder, a full-adder adds binary digits while accounting for values carried in (from a previous stage addition). Write a structural VHDL description for the full-adder circuit using the half-adder circuit that you designed in the previous section. Use the following entity declaration for your structural VHDL description of the full-adder. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y full_adder i s p o r t (a: i n s t d _ l o g i c ; b: i n s t d _ l o g i c ; c_in : i n s t d _ l o g i c ; s: o u t s t d _ l o g i c ; c_out : o u t s t d _ l o g i c ); end full_adder ; After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL description of the full-adder.5.1.3 Structural Description of a 4-bit Ripple-Carry Adder (RCA) in VHDL Using the half-adder and full-adder circuits implemented in the two previous sections, implement a 4-bit carry-ripple adder. Write a structural VHDL code for the 4-bit RCA using the following entity declaration. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y rca_structural i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (4 downto 0) ); end rca_structural ; Note that S(4) contains the carry-out of the 4-bit adder. After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL structural description of the 4-bit RCA. 5.1.4 Behavioral Description of a 4-bit RCA in VHDL In this part, you are required to implement the 4-bit RCA using behavioral description. One way to obtain a behavioral description is to use arithmetic operators in VHDL (i.e., “+”). Write a behavioral VHDL code for the 4-bit RCA using the following entity declaration for your behavioral VHDL description. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y rca_behavioral i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (4 downto 0) ); end rca_behavioral ; After you have described your circuit in VHDL, write a testbench code and perform an exhaustive test of your VHDL behavioral description of the 4-bit RCA. 5.2 VHDL Description of a One-Digit BCD Adder In this section, you will implement a one-digit BCD adder in VHDL. A one-digit BCD adder adds two four-bit numbers represented in a BCD format. The result of the addition is a BCD-format 4-bit output, representing the decimal sum, and a carry that is generated if this sum exceeds a decimal value of 9 (see slides of Lecture #11). 5.2.1 Structural Description of a BCD Adder in VHDL In this part, you are required to implement the BCD adder using structural description. You are allowed to use either behavioral or structural style of coding for your implementation. Using the following entity definition. Use the following entity declaration. l i b r a r y IEEE; u s e IEEE. STD_LOGIC_11164 .ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y bcd_adder_structural i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ; C: o u t s t d _ l o g i c ); end bcd_adder_structural ; After you have implemented the one-digit BCD adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test of your VHDL structural description of the one-digit BCD adder.5.2.2 Behavioral Description of a BCD Adder in VHDL In this part, you are required to implement the BCD adder using behavioral description. You are encouraged to base your code on the VHDL code in Section 5.7.3 of the textbook, so that you learn about conditional signal assignments (these are explained in detail in the same section as well as in the Appendix in Section A.7.4). Use the following entity declaration. l i b r a r y IEEE; u s e IEEE. STD_LOGIC_11164 .ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y bcd_adder_behavioral i s p o r t ( A: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; B: i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; S: o u t s t d _ l o g i c _ v e c t o r (3 downto 0) ; C: o u t s t d _ l o g i c ); end bcd_adder_behavioral ; After you have implemented the one-digit BCD adder in VHDL, you are required to test your circuit. Write a testbench code and perform an exhaustive test for your VHDL behavioral description of the one-digit BCD adder. 6 Questions 1. Briefly explain your VHDL code implementation of all circuits. 2. Show representative simulation plots of the 4-bit circular shift register circuits for a given input sequence, e.g., “1011” (X = “1011”), for all the possible shift amounts. 3. Show representative simulation plots of the half-adder circuit for all possible input values. 4. Show representative simulation plots of the full-adder circuit for all possible input values. 5. Show representative simulation plots of both behavioral and structural descriptions of the 4-bit RCA for all possible input values. 6. Show representative simulation plots of both behavioral and structural descriptions of the one-digit BCD adder circuit for all possible input values. 7. Report the number of pins and logic modules used to fit your designs on the FPGA board. 4-bit circular barrel shifter RCA One-digit BCD adder Structural Behavioral Structural Behavioral Structural Behavioral Logic Utilization (in LUTs) Total pins 7 Deliverables You are required to submit the following deliverables on MyCourses. Please note that a single submission is required per group (by one of the group members). • Lab report. The report should include the following parts: (1) Names and McGill IDs of group members, (2) an executive summary (short description of what you have done in this VHDL assignment), (3) answers to all questions in previous section (if applicable), (4) legible figures (screenshots) of schematics and simulation results, where all inputs, outputs, signals, and axes are marked and visible, (5) an explanation of the results obtained in the assignments (mark important points on the simulation plots), and (6) conclusions. Note – students are encouraged to take the reports seriously, points will be deducted for sloppy submissions. Please also note that even if some of the waveforms may look the same, you still need to include them separately in the report. • Project files. Create a single .zip file named VHDL#_firstname_lastname (replace # with the number of the current VHDL assignment and firstname_lastname with the name of the submitting group member). The .zip file should include the working directory of the project.
In this assignment, you will learn how to create a circuit using a schematic diagram in Quartus. You will then simulate the circuit in ModelSim. You will also learn how to use the test bench writer tool in Quartus. Finally, you will learn and practice the use of concurrent statements. 3 Learning Outcomes After completing this assignment you should know how to: • Create a schematic gate diagram of a logic circuit • Use CAD tools and VHDL to implement logic functions • Synthesize logic functions • Perform functional simulation In this assignment you will learn how to use the Altera Quartus II FPGA design and Modelsim software. 4 Creating a Schematic Diagram Design File To get some practice with Quartus, you will design a simple 4-bit comparator circuit, which you should name firstname_lastname_comp. This circuit has two 4-bit inputs, A(0), A(1), A(2), A(3) and B(0), B(1), B(2), B(3), and a single 1-bit output, AeqB. The output is to be 1 when the two inputs have exactly the same values, and be 0 otherwise. The boolean equation for the output in terms of the inputs is derived as: AeqB = (A(3) XNOR B(3)) AND (A(2) XNOR B(2)) AND (A(1) XNOR B(1)) AND (A(0) XNOR B(0)). In the above, XNOR means the complement of the XOR operation. In this course, you will learn two methods for describing circuits in Quartus: • Gate-level schematic diagrams • VHDL descriptions Schematic diagrams are graphical descriptions of the circuit, while VHDL uses textual descriptions. For most of the designs in the course you will use VHDL, but schematic entry is a good practice to get started with Quartus. To describe a circuit via a schematic diagram, follow the steps below. The first step is to open the project that was created in Assignment #1. Once you have opened the project, click on the “File” menu of the main Quartus window and select the “New” menu item. The dialog box shown below will appear.In the dialog box, select “Block Diagram/Schematic File” and click on “OK”. A new window will appear in the main area of the Quartus window. You will draw your circuit schematic in this new window. A toolbar will also appear, containing shortcuts for commands relevant to drawing schematics. Click on the toolbar item that is shaped like an AND gate. This will bring up the “symbol” window that will allow you to select which symbol to add to your schematic drawing. Note that the symbol window can also be opened by double-clicking anywhere within the schematic window. You will want to add an XNOR gate symbol to your schematic. This can be done in two ways. The first way is to expand the library directory to the primitives/logic directory and then scroll down to the XNOR item and select it.The second way is to type the symbol name directly into the Name box. You do not have to select the directory; the program will search the directory tree for the symbol. After clicking on “OK” in the Symbol window, a floating image of an XNOR gate will appear in the schematic window. As you move the mouse, the symbol will follow. Position the symbol where you would like it to be and left-click the mouse. The symbol will now be placed onto the schematic. You need to AND together the outputs of the 4 XNOR gates. This can be done with one 4-input AND gate. Before connecting the gates, you should enter the input and output ports by instantiating symbols named “input” and “output”. This is done using the pin tool (to the right of the toolbar item shaped like an AND gate). We need 2 input ports named and 1 output port.You now need to connect all of the gates together and hook them up to the input and output ports. There are two ways that you can connect nodes in the schematic: • Drawing wires between two nodes using the orthogonal node tool • Giving two different nodes the same name or label The first approach is best when making short connections between neighboring symbols, or when you want to make it immediately obvious from looking at the schematic that two nodes are connected. The second approach is the most convenient way of connecting busses (which are collections of individual nodes or wires). The software assumes that two nodes with the same name are electrically connected. To draw wires between two nodes using the orthogonal node tool first select the tool from the toolbar to the top of the window. This tool allows to draw wires which run horizontally or vertically. When this tool is selected, the cursor changes into a crosshair. To draw a wire, position the cursor over the place where you want the wire to begin, and left-click the mouse. Then, while holding the mouse button down, drag the cursor to the place where you want the wire to end, then release. Using this approach, wire up the outputs of the XNOR gates to the inputs of the AND gate and similarly the output of the AND gate to the output pin AeqB. Note that if two wires cross each other, a connection between them will be made. In this case, a dot will appear indicating a connection between the two wires. If a mistake is made, you can simply select the wire by clicking on it and remove it by pressing the Delete key of your keyboard. You can connect nodes without explicitly drawing in wires to connect them, by giving each node the same name. In order to use node naming as a connection technique, you must give names, or labels, to the nodes to be connected. Begin by naming the input busses. To do this, double-left-click on the left-most part of one of the input symbol. This should highlight the input name. Name the inputs as A[3..0] and B[3..0]. Also name the output as AeqB. The name A[3..0] means that A is a bus with 4 wires, having indices of 3, 2, . . . , 0. The node named A[3] corresponds to the Most-Significant-Bit (MSB) of A, and A[0] to the Least-Significant-Bit (LSB) of A. Add wires to the inputs of the XNOR gates. Use node naming (right click on the wire, and then on the “General” tab) to connect the input symbols to the XNOR gates.You should now save the design, using the “Save As” item in the “File” menu. Call your file firstname_lastname_comp.bdf. Make sure that the “Add file to current project” box is selected. Once you have saved your file, you should compile it. Before compiling your file, you should first set your file as the top-level entity. In the Project Navigator window, change the Hierarchies tab to the Files tab. You should be able to see your file listed under the Files tab. Right click on your file and set it as the top-level entity by clicking on Set as Top Level-Entity. Now you can compile your file by double-clicking on the “Analysis & Synthesis” located on the Tasks window. Note that both the Tasks window and the Navigator window are on left side of the Quartus window. The analysis and synthesis process will check for syntax errors in your code. If the compilation is successful, a check mark will appear next to the “Analysis & Synthesis”. In case of getting errors, you should go back to your design and fix the errors. To get statistics on the resources required to implement your design, you can double click on “Fitter (Place & Route)”. This process fits your design into the FPGA using its available resources and reports a summary of the resource utilization of your design on a window named “Compilation Report”. 4.1 Simulation of the Circuit Using ModelSim Once you have your circuit described using the schematic gate diagram, you should simulate it. The purpose of simulation is generally to determine: 1. If the circuit performs the desired function, and 2. if timing constraints are met. In the first case, we are only interested in the functionality of our implementation. We do not care about propagation delays and other timing issues. Because of this, we do not have to map our design to a target hardware. This type of simulation is called functional simulation. The other form of simulation is called timing simulation. It requires that the design be mapped onto a target device, such as an FPGA. Based on the model of the device, the simulator can predict propagation delays, and provide a simulation that takes these into account. Thus, the timing simulation may produce results that are quite different from the purely functional simulation. In this course, you will be using the ModelSim simulation software, created by the company Mentor Graphics(actually you will use a version of it specific to Quartus, called Modelsim-Altera). The Modelsim software operates on an Hardware Description Language (HDL) description of the circuit to be simulated, written either in VHDL, Verilog, or System-Verilog. You will use VHDL. Modelsim does not understand schematic diagrams, so you need to convert your schematic diagram to an HDL description. You could just write the VHDL description directly, but you already have a schematic so it would be nice if you could use that. Fortunately, Quartus has the built-in capability of converting schematic diagrams to a VHDL description, so you can use that to convert your comparator schematic to VHDL. To convert this schematic to a VHDL description that the simulator can use, select “File”>“Create/Update”>“Create HDL Design File from Current File”. Upon clicking “OK,” a VHDL description of your schematic file, named as firstname_lastname_comp.vhd, will be generated in your working directory. You can see the contents of this file using a text editor. Now, we can start the simulation process. Double-click on the ModelSim desktop icon to run the ModelSim program. A window similar to the one shown below will appear.Select “FILE”>“New”>“Project” and, in the window that appears, give the project the name firstname_lastname_lab2. Once you click OK, another dialog box will appear allowing you to add files to the project. Click on “Add Existing File” and select the VHDL file that was generated earlier (firstname_lastname_comp.vhd). You can also add files later. The ModelSim window will now show your VHDL file in the Project pane.To simulate the design, ModelSim must analyze the VHDL files, a process known as compilation. The compiled files are stored in a library. By default, this is named “work”. You can see this library in the “library” pane of the ModelSim window. The question marks in the Status column in the Project tab indicate that either the files have not been compiled into the project or the source file has changed since the last compilation. To compile the files, select Compile > Compile All or right click in the Project window and select Compile > Compile All. If the compilation is successful, the question marks in the Status column will turn to check marks, and a success message will appear in the Transcript pane (see figure below). The compiled VHDL files will now appear in the library “work”. Since all of the inputs are undefined, if you ran the simulation now, the outputs would be undefined. So you need to have a means of setting the inputs to certain patterns, and of observing the outputs’ responses to these inputs. In ModelSim, this is done by using a special VHDL entity called a Testbench. A testbench is special VHDL code that generates different inputs that will be applied to your circuit so that you can automate the simulation of your circuit and see how its outputs respond to different inputs. Note that the testbench is only used in Modelsim for the purposes of simulating your circuit. You will eventually synthesize your circuits into a real hardware chip called an FPGA. However, you will NOT synthesize the testbench into real hardware. Because of its special purpose (and that it will not be synthesized), the testbench entity is unique in that it has NO inputs or outputs, and uses some special statements that are only used in test benches. These special statements are not used when describing circuits that you will later synthesize to a FPGA. The testbench contains a single component instantiation statement that inserts the module to be tested (in this case the firstname_lastname_comp module), as well as some statements that describe how the test inputs are generated. After you gain more experience you will be able to write VHDL testbenches from scratch. However, Quartus has a convenient built-in process, called the Test Bench Writer, which produces a VHDL template from your design that will get you started. To get the template, go back to the Quartus program, making sure that you have the firstname_lastname_comp project loaded. Then, in the Processing toolbar item, select “Start>Start Test Bench Template Writer”. This will generate a VHDL file named firstname_lastname_comp.vht and place it in the simulation/modelsim directory. Open the template in Quartus. Note that the template already includes the instantiation of the under test circuit (i.e., firstname_lastname_comp component). It also includes the skeletons of two “process” blocks, one labeled “init” and the other labeled “always”. The init process block can be deleted. You should edit the “always” process block to suit your needs, so in this case it will be used to generate the code signal waveform. You will notice that inside the process block, signal code are assigned multiple times! This should not make sense right now. If a signal was assigned multiple times using concurrent signal statements, this would be an error! However, the rules for statements inside a process block are different. Thankfully, it is not important to understand process blocks at this point. We will learn about them later on. The “wait for x ns” statement is a special VHDL statement that is only used in VHDL testbenches, and not in VHDL descriptions of synthesizable circuits that are intended to be implemented in real hardware. We never indicate time this way in synthesizable VHDL. For a simple introductory example, replace the “always” process block with the following text: always : PROCESS BEGIN A
In this assignment, you will learn how to use Intel Quartus II FPGA design software, how to set up a project, and the basics of writing VHDL code by following a step-by-step tutorial. 3 Learning Outcomes After completing this lab you should know how to: • Run the Intel Quartus software • Create the framework for a new project • Create a new VHDL file 4 Run Intel Quartus In this course you will be using commercial FPGA design software: the Intel Quartus Prime program and the Mentor Graphics ModelSim simulation program. Quartus Prime and ModelSim are installed on the computers in the lab. You can also obtain a slightly restricted version, the Quartus Lite edition, from the Intel web site1 . The program restrictions will not affect any designs you will be doing in this course. You can (and you should) install the applications on your personal computer to work on your project outside of the lab. You should use version 18.0 of the program, as this is the latest version that supports the prototyping board (the Altera DE1-SoC board) that you will be using. For Mac users, Intel Quartus is not available for MacOS. As such, you can connect to the lab computers via remote access. Follow the these instructions: 1. Connect to the McGill VPN as shown here: https://mcgill.service-now.com/itportal?id=kb_article& sysparm_article=KB0010687 2. Use Microsoft Remote Desktop (https://mcgill.service-now.com/itportal?id=kb_article&sysparm_ article=KB0010725) to connect to machines 156TR4060-01.campus.mcgill.ca – 156TR4060-17.campus.mcgill.ca. Note that the number at the end of the name of the machine (before “.campus.mcgill.ca”) is the machine number in the lab (a total of 17 machines). To begin, start Quartus Prime by selecting it in the Windows Start menu: 1Go to https://www.intel.com/content/www/us/en/programmable/downloads/download-center.html Then select Version 18.0 and finally choose the Lite edition. You may have to register for an Intel account. Also, before you press the Download button make sure that the correct version and edition are selected through the dropdown menus.The following window will appear on startup (this shows version 18.0 downloaded from Intel’s web site; the versions on the lab computers may look slightly different). Intel Quartus Prime employs a project-based approach. The goal of a Quartus project is to develop a hardware implementation of a specific function, targeted to an FPGA (Field Programmable Gate Array) device. Typically, the project will involve a (large) number of different circuits, each designed individually, or taken from circuit libraries. Project management is therefore important. The Quartus Prime program aids in the project management by providing a project framework that keeps track of the various components of the project, including design files (such as schematic block diagrams or VHDL descriptions), simulation files, compilation reports, FPGA configuration or programming files, project specific program settings and assignments, and many others. The first step in designing a system using the Quartus Prime approach is therefore to create the project framework. The program simplifies this by providing a “Wizard” which guides you through a step-by-step setting of the most important options. To run the Project Wizard, click on the File menu and select the New Project Wizard entry. 5 Creating a New Project The New Project Wizard involves going through a series of windows. The first window is an introduction, listing the settings that can be applied. After reading the text on this window, click on “Next” to proceed.In the second window, you should give the project the following name: firstname_lastname_vhdl1. Make sure to replace firstname_lastname with your full name. The working directory for your project will be different than that shown in the screenshot below. Make sure that you change the working directory to the directory of your choice. If you use a lab computer, use your network drive for your project files. We don’t have a project template at this point, so select Empty project and proceed.You will add files later, so for now, just click on “Next”. Later in the course, you will be downloading a design to an FPGA device on the DE1-SoC board. These devices belong to the Cyclone V family of FPGAs, with the following part number: 5CSEMA5F31C6. To ensure proper configuration of the FPGAs, select this device as shown below.The dialog box in the next window permits the designer to specify 3rd-party tools to use for various parts of the design process. We will be using a 3rd-party Simulation tool called ModelSim-Altera, so select this item from the Simulation drop-down menu. The final page in the New Project Wizard is a summary. Check it over to make sure everything is okay (e.g., the project name, directory, and device assignment) then click Finish.Your project framework is now ready. 6 Writing Hello Gate code in VHDL The following VHDL code describes an OR gate (available online at https://www.edaplayground.com/x/A4). Create a new VHDL file within your project and write/paste the code in the new file. Make sure to select the OR gate as the top-level design entity (see figure below). −− S imp l e OR g a t e d e s i g n l i b r a r y IEEE ; u s e IEEE . s t d _ l o g i c _ 1 1 6 4 . a l l ; e n t i t y or_gat e i s p o r t ( a : i n s t d _ l o g i c ; b : i n s t d _ l o g i c ; q : o u t s t d _ l o g i c ) ; e n d or_gat e ; a r c h i t e c t u r e r t l o f or_gat e i s b e g i n p r o c e s s ( a , b ) i s b e g i n q Options > General > EDA Tools to make sure that the path to Altera Modelsim is configured correctly. If you installed “Quartus with Altera ModelSim” the path should be similar to the one shown in the figure below (i.e., on Windows: C:altera13.0sp1mode-sim_asewin32aloem, otherwise you will need to browse to where you installed Altera Modelsim and point it to the win32aloem directory (see figure below).Now we need to create a testbench file. Test benches are used to test our VHDL code. In the testbench file, we will typically define the VDHL code of our hardware as the device under test (DUT) and then define the input vectors to test the DUT. The VHDL code that describes the testbench for the OR gate is provided below. Write/paste the code in a new VHDL file within your project. This file should be defined as the testbench in the settings. To specify the testbench go to this path Assignments > Settings > EDA Tool Settings > Simulation and specify your testbench file as shown in the figure below. −− T estben ch f o r OR g a t e l i b r a r y IEEE ; u s e IEEE . s t d _ l o g i c _ 1 1 6 4 . a l l ; e n t i t y t e s t b e n c h i s −− empty e n d t e s t b e n c h ; a r c h i t e c t u r e tb o f t e s t b e n c h i s −− DUT component c om p o n e n t or_gat e i s p o r t ( a : i n s t d _ l o g i c ; b : i n s t d _ l o g i c ; q : o u t s t d _ l o g i c ) ; e n d c om p o n e n t ; s i g n a l a_ in , b_ in , q_out : s t d _ l o g i c ; b e g i n −− Connect DUT DUT: or_gat e p o r t map ( a_ in , b_ in , q_out ) ; p r o c e s s b e g i n a_in
In this VHDL assignment, you will describe a sequence detector in VHDL and test it on the Altera DE1-SoC board using a sequence from a memory input (ROM). If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). • Refer to the DE1-SoC User Manual (in the Content tab on myCourses). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 Sequence Detector In this assignment, you will first implement a sequence detector circuit that takes a sequence of bits as its input and detects two different bit patterns in the input sequence. Specifically, you are asked to design a circuit based on Moore-type FSM(s) with an asynchronous reset signal and an enable signal, that takes a sequence of bits as its input “seq” and has two outputs “out_1 ” and “out_2 ”. The outputs should be “out_1 = 1” and “out_2 = 1” at the clock cycle following the input bit patterns 1011 and 0010, respectively; they should be 0, otherwise. Note that state transitions only occur when the enable signal is high. Otherwise, the FSM stays at its current state. An example of the desired behavior isclock: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 seq: 0 0 1 1 0 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 1 out_1: 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 out_2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 Note: It is best to use two FSMs, each detecting one of the two bit patterns (why?). Your VHDL code will be based on the state diagram(s) of your FSMs. It is, therefore, important that you first come up with a simple state diagram. In fact, you should not need more than five states for each of the two FSMs. Use the following entity declaration for your VHDL description of the sequence detector: l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_FSM i s P o r t ( seq : i n s t d _ l o g i c ; enable : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; out_1 : o u t s t d _ l o g i c ; — generates 1 when the pattern “1011” is detected ; otherwise 0. out_2 : o u t s t d _ l o g i c ); — generates 1 when the pattern “0010” is detected ; otherwise 0. end firstname_lastname_FSM ; firstname_lastname in the name of the entity is the name of one of the students in your group. 5 Sequence Counter In this part, you are required to implement a sequence counter circuit that counts how many times each of the two patterns occurs in the input bit stream. The sequence counter circuit can be realized using the sequence detector circuit, which detects the patterns, followed by two 3-bit up-counters (one for each output of the FSMbased circuit) that keep track of the number of detected patterns. More specifically, each counter is incremented whenever its corresponding pattern has been detected. Use the sequence detector and the 3-bit up-counter from VHDL Assignment #5 to implement the sequence counter circuit with an asynchronous reset and an enable signals. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_sequence_detector i s P o r t ( seq : i n s t d _ l o g i c ; enable : i n s t d _ l o g i c ; reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; cnt_1 : o u t s t d _ l o g i c _ v e c t o r (2 downto 0) ; — counts the occurrence of the pattern “1011”. cnt_2 : o u t s t d _ l o g i c _ v e c t o r (2 downto 0) ); — counts the occurrence of the pattern “0010”. end firstname_lastname_sequence_detector ; firstname_lastname in the name of the entity is the name of one of the students in your group. An example of the desired behavior is clock: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 seq: 0 0 1 1 0 1 1 0 1 1 0 0 1 0 0 1 0 1 1 0 1 out_1: 0 0 0 0 0 0 0 1 1 1 2 2 2 2 2 2 2 2 2 3 3 out_2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 2Note that when the enable signal of the sequence detector circuit is low, its counter and FSM-based instances hold their previous value and state, respectively. Also, the reset signal of the sequence counter should be active low. 6 Demonstration on the FPGA Board So far, you have designed a circuit counting the number of occurrences of two different patterns within the input sequence of bits. To demonstrate the functionality of your circuit, you must supply it with a sequence of bits and display the number of occurrences of each pattern on the HEX displays. In contrast to previous VHDL assignments where you inserted the inputs using slider switches and push-buttons, the input sequence cannot be inserted in the same fashion. Instead, we store the input bit sequence into a read-only-memory (ROM) and we supply the circuit the bits of the sequence one after the other every 1 second by reading them from the ROM. The ROM circuit takes clock and asynchronous reset signals as inputs, and outputs a bit of the sequence under test at the positive edge of the clock signal when the enable signal is high. The VHDL code for the ROM is given in ROM.vhd. To read each bit of the sequence under test every 1 second, you will need to create an instance of the clock divider circuit from VHDL Assignment #5 to enable both the ROM and sequence counter circuits at the appropriate rate (once per second). The 7-segment decoder that you created in VHDL Assignment #4 will be then used to display the occurrence number of each pattern. Note that the clk port of all the components (i.e., the clock divider, ROM, and sequence counter circuits) is supplied with a clock frequency of 50 MHz. The following figure shows the high-level architecture of the circuit. Clock Divider ROM (given in ROM.vhd) clk Sequence Counter clk 7-Segment Decoder 7-Segment Decoder HEX0 HEX5 Note that Pushbutton PB0 is used to reset the circuit. When the button is pressed, the circuit has to display 0 on the 7-segment displays until it is released. Recall that the output of a pushbutton is high when the button is not being pushed, and is low when the button is being pushed. Use the following entity declaration: l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_wrapper i s P o r t ( reset : i n s t d _ l o g i c ; clk : i n s t d _ l o g i c ; HEX0 : o u t s t d _ l o g i c _ v e c t o r (6 downto 0) , HEX5 : o u t s t d _ l o g i c _ v e c t o r (6 downto 0) ); end firstname_lastname_wrapper ; firstname_lastname in the name of the entity is the name of one of the students in your group. Compile the circuit in Quartus. Once you have compiled the circuit, it is time to map it on the Altera DE1-SoC board. Perform the pin assignment for the HEX display, the pushbutton, and the slider switch. Make sure that you connect the clock signal of your design to 50 MHz clock frequency (see the DE1 user’s manual for the pin location of 50 MHz clock frequency). Program the board and test the functionality of your circuit.7 Questions Please note that even if some of the waveforms may look the same, you still need to include them separately in the report. 1. Why is it better to use two FSMs, rather than one, in the implementation of the sequence detector from Section 4? 2. Briefly explain your VHDL code implementation of all circuits. 3. Provide waveforms for each of the individual circuits (each section) and for the wrapper. 4. Perform timing analysis of the wrapper and find the critical path(s) of the circuit. What is the delay of the critical path(s)? 5. Report the number of pins and logic modules used to fit your design on the FPGA board. 8 Deliverables You are required to submit the following deliverables on MyCourses. Please note that a single submission is required per group (by one of the group members). • Lab report. The report should include the following parts: (1) Names and McGill IDs of group members, (2) an executive summary (short description of what you have done in this VHDL assignment), (3) answers to all questions in previous section (if applicable), (4) legible figures (screenshots) of schematics and simulation results, where all inputs, outputs, signals, and axes are marked and visible, (5) an explanation of the results obtained in the assignments (mark important points on the simulation plots), and (6) conclusions. Note – students are encouraged to take the reports seriously, points will be deducted for sloppy submissions. Please also note that even if some of the waveforms may look the same, you still need to include them separately in the report. • Project files. Create a single .zip file named VHDL#_firstname_lastname (replace # with the number of the current VHDL assignment and firstname_lastname with the name of the submitting group member). The .zip file should include the working directory of the project.
In this VHDL assignment, you will learn how to describe storage elements and sequential logic circuits in VHDL. Then, you will design a 3-bit up-counter and simulate the counter using ModelSim. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). • Refer to the DE1-SoC User Manual (in the Content tab on myCourses). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 VHDL Description of Storage Elements In the previous VHDL assignments, you have learned how to use sequential statements to describe the behavior of combinational circuits inside the process block. Sequential statements within the process block can also be used to describe sequential circuits such as storage elements. In digital systems, we have two types of memory elements: latches and flip-flops (FFs). Latches are memory elements that immediately reflect any changes in the input signal to the output signal when the level of the control signal (clk) is high. As such, latches are usually referred to as level-triggered memory elements. Alternatively, FFs change their state when the control signal goes from either high to low or from low to high. Note that FFs working with a control signal that goes from high to low or from low to high are called, respectively, negative-edge-triggered and positive-edge-triggered FFs. In digital systems,edge-triggered flip-flops are superior to level-triggered latches since FFs are more robust. For example, noise can easily disrupt the output of a latch when the control signal is high. On the other hand, the output of FFs can be disrupted only in presence of noise at the edge transition of the control signal. It is highly recommended, therefore, to use FFs when designing sequential circuits. In VHDL, process blocks are used to describe FFs. Since FFs set their state at the edge of the control signal, we need a statement to detect the edge transition of the control signal. This can be simply done by using a IF-THEN-ELSE statement. Assuming that clk is the control signal of a FF, a positive edge transition (i.e., a transition from ‘0’ to ‘1’) of the clk signal can be detected by the following statement: RISING_EDGE (clk). Similarly, a negative edge transition (i.e., a transition from ‘1’ to ‘0’) can be detected by the following statement FALLING_EDGE (clk). For example, the following process block describes a positive-edge-triggered DFF. PROCESS ( clk ) BEGIN IF RISING_EDGE( clock ) THEN Q
In this assignment, you will use a previously designed BCD adder circuit, you will implement and test the circuit on the Altera DE1-SoC board. You will learn how to work with the Altera DE1-SoC board, use switches, and the 7-segment LED display. If you need any help regarding the lab materials, you can • Ask the TA for help during lab sessions and office hours. • Refer to the text book. In case you are not aware, Appendix A “VHDL Reference” provides detailed information on VHDL. • You can also refer to the tutorial on Quartus and ModelSim provided by Intel (click here for Quartus and here for ModelSim). It is highly recommended that you first try to resolve any issue by yourself (refer to the textbook and/or the multitude of VHDL resources on the Internet). Syntax errors, especially, can be quickly resolved by reading the error message to see exactly where the error occurred and checking the VHDL Reference or examples in the textbook for the correct syntax. 4 BCD to 7-Segment LED Decoder A 7-segment LED display includes 7 individual LED segments, as shown below. By turning on different segments together, we can display characters or numbers. There are six of these displays on the DE1-SoC board, which you will use later to display the result of your full implementation of the adder.We will need a circuit to drive the 7-segment LEDs on the DE1 board, called 7-segment decoder. It takes as input a 4-bit BCD-formatted code representing the 10 digits between 0 and 9, and generates the appropriate 7-segment display associated with the input code. For many LED displays, including the ones on the DE1 board, segments turn on when their segment inputs are driven low, that is “0” means on and “1” means off. This scheme for the inputs is called “active low.” The VHDL code for the 7 segment decoder is provided below. l i b r a r y i e e e ; u s e i e e e . s t d _l o gi c _ 1 1 6 4 . a l l ; u s e i e e e . n ume ri c_ s t d . a l l ; e n t i t y seven_segment_decoder i s p o r t ( code : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; segments_out : o u t s t d _ l o g i c _ v e c t o r (6 downto 0) ); end seven_segment_decoder ; a r c h i t e c t u r e decoder o f seven_segment_decoder i s b e g i n WITH code SELECT segments_out — statements WHEN value2 => — statements WHEN value3 => — statements — etc . WHEN OTHERS => — statements END CASE; Note that the CASE statement must include a WHEN clause for each of the possible values of object. This necessitates a WHEN OTHERS clause if some of possible values of object are not covered by WHEN clauses. 8 4-Bit Comparator Comparators are a useful type of arithmetic circuits that compare the relative sizes of two binary numbers. In this assignment you will implement a 4-bit comparator circuit that takes two 4-bit unsigned inputs A and B, and determines which one of the cases A = (B + 1), A < (B + 1), A ≤ (B + 1), A > (B + 1), A ≥ (B + 1) holds. It should detect the occurrence of overflow when performing B + 1, i.e., detect if B + 1 requires more than 4 bits. Use the following entity declaration for your implementation of the comparator circuit. l i b r a r y IEEE; u s e IEEE.STD_LOGIC_1164.ALL; u s e IEEE.NUMERIC_STD.ALL; e n t i t y firstname_lastname_comparator i s P o r t ( A, B : i n s t d _ l o g i c _ v e c t o r (3 downto 0) ; AgtBplusOne : o u t s t d _ l o g i c ; AgteBplusOne : o u t s t d _ l o g i c ; AltBplusOne : o u t s t d _ l o g i c ; AlteBplusOne : o u t s t d _ l o g i c ; AeqBplusOne : o u t s t d _ l o g i c ; overflow : o u t s t d _ l o g i c ); end firstname_lastname_comparator ; Note that in case of overflow when performing B + 1, the comparator circuit outputs 1 for the overflow signal while the remaining signals (i.e., AgtBplusOne, AgteBplusOne, AltBplusOne, AlteBplusOne and AlteBplusOne) are set to 0. Otherwise, the circuits outputs proper values for AgtBplusOne, AgteBplusOne, AltBplusOne, AlteBplusOne and AeqBplusOne signals according to A > (B+1), A ≥ (B+1), A < (B+1), A ≤ (B+1), A = (B+1), respectively, while the overflow signal is set to 0. For example, if A = 910 and B = 510 then both AgtBplusOne, AgteBplusOne should be 1, while the rest (including overflow) should be 0. The firstname_lastname in the name of the entity is the name of one of the students in your group. To describe your comparator in VHDL, use sequential statements in a single process block. Once you have described your circuit in VHDL, you should test your design on the FPGA board. Similar to the VHDL Assignment #4, the inputs of the comparator circuit are provided using the slider switches; you will need to use eight slider switches. To visualize the output signals of the comparator circuit, use the LEDs located right above the slider switches on the FPGA board. Note that you will need to use six LEDs (one for each output signal). When an output signal is set to ’1’, the corresponding LED turns on; otherwise, it will be off. Afterwards, perform the pin assignments and program the FPGA by following the instructions provided in VHDL Assignement #4. Note that the pin locations of LEDs are different from those of 7-segment LEDs. The pin assignments for the LEDs are given in Figure 3.17 on p. 25 of the Altera board manual. Once completed, test the functionality of your comparator circuit for different input values using the LEDs and slider switches. 9 Questions 1. Briefly explain your VHDL code implementation of all circuits. 2. Provide the VHDL code that you wrote for the wrapper circuit. 3. Show a representative photo of the board displaying the result of the addition of A and B, where A is the last (rightmost) digit of the McGill ID number of one of the students in your group and B is the last (rightmost) digit of the McGill ID number of the other student in the group (if there are three members in your group, choose two IDs and state which were chosen). Clearly indicate which 7-segment LEDs/sliding switches are assigned to which inputs/outputs of the circuit on the photo. 4. Report the number of pins and logic modules used to fit your designs on the FPGA board. 5. Given that A = 510, provide a separate simulation plot that demonstrates all possible cases for the 4-bit comparator, including a separate plot for the case where overflow occurs. A total of four plots should be included. Explain each plot and mark all inputs and outputs clearly. 6. Perform timing analysis (slow 1,100 mV model) of the 4-bit comparator and find the critical path(s) of the circuit. What is the delay of the critical path(s)? 10 Deliverables You are required to submit the following deliverables on MyCourses. Please note that a single submission is required per group (by one of the group members). • Lab report. The report should include the following parts: (1) Names and McGill IDs of group members, (2) an executive summary (short description of what you have done in this VHDL assignment), (3) answers to all questions in previous section (if applicable), (4) legible figures (screenshots) of schematics and simulation results, where all inputs, outputs, signals, and axes are marked and visible, (5) an explanation of the results obtained in the assignments (mark important points on the simulation plots), and (6) conclusions. Note – students are encouraged to take the reports seriously, points will be deducted for sloppy submissions. Please also note that even if some of the waveforms may look the same, you still need to include them separately in the report. • Project files. Create a single .zip file named VHDL#_firstname_lastname (replace # with the number of the current VHDL assignment and firstname_lastname with the name of the submitting group member). The .zip file should include the working directory of the project.