Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Csc 446 assign #3 create a recursive descent parser for the cfg given in the previous assignment.

Create a recursive descent parser for the CFG given in the previous assignment.You may use the test programs given in the previous assignment and those included at the end to test your parser. Be sure to realize that this is not an exhaustive test of your parser and you should develop as many test cases as you can think of. To turn in your assignment submit both your lexical analyzer and parser programs in a zip file placed into the Assign 3 dropbox on D2L. Submit all parts of your assignment. Use proper data abstraction techniques when you write your program. This means that the parser and lexical analyzers need to be in separate source files. Include any other needed programs so that they will compile without modification. Programs written in C or C++ are required to compile and run on the machine cscssh.sdstate.edu. This is a dual processor Zeon 12 core 64 bit processor and not an Intel Pentium or AMD Ryzen processor. To test your software you may connect to cscssh.sdstate.edu using the ssh protocol and run your program there. When working on the Linux system in the lab your files are hosted on this machine, so you do not have to transfer any files. You simply open a shell prompt window and type the command “ssh cscssh.sdstate.edu” and hit enter. You will be prompted for your password and then you are in. Notice that the prompt has now changed to “username@csc-linux1”. The real name for cscssh is csc-linux1! Programs written in C# will be run under Visual Studios 2022 with the test files located in the same folder as your executable program. To ensure that the program is indeed legal your parser must terminate with the end of file token! Test Programs The simplest Ada program is then: PROCEDURE one IS BEGIN END one; A more typical program would be PROCEDURE MAIN IS PROCEDURE PROC1 ISBEGINEND PROC1; BEGINEND MAIN; A more complicated program would look like PROCEDURE seven IScount:CONSTANT:=5;a,b:INTEGER;PROCEDURE eight(x:INTEGER; y:INTEGER) ISBEGINEND eight;BEGINEND seven; Finally, you could use this program too!procedure five isa,b,c,d:integer;procedure fun(a:integer; out b:integer) isc:integer;beginend fun;beginend five;

[SOLVED] Csc 446 assign #2 given the following grammar for a subset of ada: prog -> procedure idt args is

Given the following grammar for a subset of Ada: Prog -> procedure idt Args isDeclarativePartProceduresbeginSeqOfStatementsend idt; DeclarativePart -> IdentifierList : TypeMark ; DeclarativePart | e IdentifierList -> idt |IdentifierList , idt TypeMark -> integert | realt | chart | const assignop Value Value -> NumericalLiteral Procedures -> Prog Procedures | e Args -> ( ArgList ) | e ArgList -> Mode IdentifierList : TypeMark MoreArgs MoreArgs -> ; ArgList | e Mode -> in | out | inout | e SeqOfStatments -> e Draw the parse trees for the following programs (PLEASE UNDERLINE ALL TOKENS): beginend two; end six; beginend three;Hint: You will probably want to use your paper sideways. Save this grammar, as it will be used in the next assignment.

[SOLVED] Csc 446 assignment #1 this project consists of writing a lexical analyzer for a subset of the ada programming language.

This project consists of writing a Lexical Analyzer for a subset of the Ada programming language. The Lexical Analyzer is to be a module written in the language of your choice that exports the following: procedure GetNextTokenglobal variables TokenLexemeValue {for integer tokens}ValueR {for real tokens}Literal {for strings} The following are the reserved words in the language (may be upper or lower case): BEGIN, MODULE, CONSTANT, PROCEDURE, IS, IF, THEN, ELSE,ELSIF, WHILE, LOOP, FLOAT, INTEGER, CHAR, GET, PUT, END. The notation for specifying tokens is as follows: Comments begin with the symbol — and continue to the end of the line. Comments may appear after any token. Blanks between tokens are optional, with the exception of reserved words. Reserved words must be separated by blanks, newlines, the beginning of the program or the final semicolon. Token id for identifiers matches a letter followed by letters, underscore and/or digits having a maximum length of 17 characters. Ada identifiers are not case sensitive. letter -> [a-z,A-Z]digit -> [0-9]underscore -> _id -> letter(letter | digit | underscore )* Token num matches unsigned integers or real numbers and has attribute Value for integers and ValueR for real numbers. digits -> digit digit*optional_fraction -> . digits | enum -> digits optional_fraction String literals begin with a “ and end with a “ and should be stored in the literal variable. Strings must begin and end on the same line. The relational operators (Token relop) are: =, /=, = The addop’s are: +, -, and or The mulop’s are: *, /, rem, mod, and and. The assignop is: := The following symbols are also allowed in the language: ( ) , : ; . “ The Ada subset has the following rules: Parameterless procedure declarations start the program,procedures are begun with the reserved word PROCEDUREfollowed by an id the word IS then a semicolon. The body of a procedure starts with the reserved word BEGINand terminates with the reserved word END followed by thename of the procedure and a semicolon. The tokens for each possible symbol (or type of symbol) should bedeclared as an enumerated data type. To test your project, write a short program that imports (uses) module LexicalAnalyzer to read a source program and output the tokens encountered and the associated attributes (lexeme for identifiers and reserved words, the numeric value for token num, and the symbol itself for all others).Source code for this and all other assignment must be submitted in a single zip file to the appropriate D2L dropbox on or before the due date.

[SOLVED] Csci-shu360 machine learning homework 4

Random forests (RF) build an ensemble of trees independently. It uses bootstrap sampling (sampling with replacement, as discussed in class) to randomly generate B datasets from the original training dataset, each with the same size as the original one but might contain some duplicated (or missing) samples. Each sample has a multiplicity which is greater than or equal to zero. In your python implementation, you can use numpy.random.choice for the sampling procedure.The RF procedure independently trains B decision tree models {fb(·; θb)} B b=1 on these B datasets and these are done independently. To train each tree, we start from a root node with all the assigned data, and recursively splitting the node and its data into two child nodes, by a decision rule on a feature dimension (thresholding in particular). When we’re all done, RFs produces predictions by averaging the outputs of all the B models as follows: yˆi = 1 B X B b=1 fb(xi ; θb). (1) The optimization problem for training each tree in RF is min θb Xn i=1 ℓ(yi , yˆi) + Ω(θb), (2) where ˆyi is the prediction produced by tree-b fb(·; θb) for data point xi , ℓ(·, ·) is a loss function (detailed in Problem 2.5.3), and Ω(θb) is a regularizer applied to the parameters θb of model-b (that is, Ω(θb) measures the complexity of model-k). Most descriptions of ensemble learning in Problem 2.1 of the homework (below) can be also applied to RF, such as the definitions of fk(·; θk) and θk, except Eq. (3) and Eq. (4).Different methods can be used to find the decision rule on each node during the optimization of a single tree. A core difference between random forests and GBDTs (which we will describe in Problem 2) is the tree growing methods. Specifically, in the case of GBDT, we use the standard greedy tree-splitting algorithm; in the case of random forests, we greedily learn each tree using a bootstrapped data sample and random feature selection as described in class. That is, the key difference is the data that is being used (always original data in the case of GBDT or bootstrap sample for each tree in the case of RFs), and in the case of RFs we choose a random subset of features each time we grow a node. The underlying algorithm, however, is very similar. Therefore, to facilitate code reuse between this and the next problem, and also to make more fair the comparison between RFs and GBDTs, we ask you to use the same code base between this and the next problem (detailed in Problem 2.4 below).Each tree in the RF method is like the first tree in GBDT, as the RF method does not consider any previously produced trees when it grows a new tree (the trees are independent with RFs). With RFs, we simply start with ˆy 0 i . You need to notice this fact when re-using the code from GBDT, because Gj and Hj for tree-k in RF only depend on ˆy 0 i , not ˆy k−1 i . Instructions 2-5 in Problem 2.5, however, can still be applied to RF tree building here.In this problem, you will implement RFs for both regression and binary classification problems. Please read Problem 2.1, 2.4, and 2.5 below before you start. 1. [20 points] Implement RF for regression task, and test its performance on Boston house price dataset1 used in Homework 2. Report the training and test RMSE. How is the performance of RF compared to least square regression and ridge regression? 2. [20 points] Implement RF for binary classification task, and test its performance on Credit-g dataset. It is a dataset classifying people described by 20 attributes as good or bad credit risks. The full description of the attributes can be found at https://www.openml.org/d/31. Report the training and test accuracy. Try your implementation on breast cancer diagnostic dataset2 , and report the training and test accuracy. 1https://www.kaggle.com/c/boston-housing 2https://www.kaggle.com/uciml/breast-cancer-wisconsin-data2.1 Problem of Ensemble Learning in GBDTs Gradient Boosting Decision Trees (GBDT) is a class of methods that use an ensemble of K models (decision trees) {fk(·; θk)} K k=1. It produces predictions by adding together the outputs of the K models as follows: yˆi = X K k=1 fk(xi ; θk). (3)The resulting ˆy can be used for the predicted response for regression problems, or can correspond to the class logits (i.e., the inputs to a logistic or softmax function to generate class probabilities) when used for classification problems. GBDT has been widely used in a variety of real applications and industrial fields with great success, due to GBDT’s training efficiency (especially compared to deep neural networks), and more importantly, its interpretability, since decision trees and the decision rules on them are much more transparent for humans to understand and explain than high-dimensional model weights and artificial non-linearly aggregated features produced by complicated deep neural networks.In addition, GBDT methods have been the winner of a large number of machine learning competitions, for example, in many Kaggle competitions3 , where people can earn prizes and money from winning simply by using GBDT (via standard packages XGBoost or LightGBM below). Furthermore, GBDT has empirically shown itself to be more robust to noise and data imbalance issues on different types of data and tasks than other methods. There are many GBDT packages that optimize the efficiency of almost every step of GBDT and make the algorithm extremely fast and easy to use, for example, XGBoost4 and LightGBM5 . In the problem below, you are not allowed to use any of these packages in your code, but you need to implement GBDT from scratch. GBDT is an extremely useful state-of-the-art ML tool. Given the success of deep neural networks, it is heartening that a different class of methods often performs so well. In the below, you will learn to implement it step by step, by following the tutorial below.The optimization problem for training an ensemble of models is min {θk}K k=1 Xn i=1 ℓ(yi , yˆi) +X K k=1 Ω(θk), (4) where θk is the parameters for the k th model, and where ˆyi is the prediction produced by GBDT for data point xi , ℓ(·, ·) is a loss function (detailed definition of losses are given in Problem 2.5.3), and Ω(θk) is a regularizer applied to the parameters θk of model-k (that is, Ω(θk) measures the complexity of model-k). In GBDT, each model fk(·; θk) is a decision tree that assigns each data point x to one of its leaf nodes qk(x) ∈ Lk, where Lk is the set of all the leaf nodes in the k th tree. Each leaf node j ∈ Lk has a weight w k j , and (w k 1 , wk 2 , . . . , wk |Lk| ) = w k ∈ R |Lk| is a |Lk|-dim vector storing the weights of all the leaf nodes. We may write the decision tree function as follows fk(x) = w k qk(x) , (5) where qk(·) : R d 7→ Lk represents the decision process of the k th decision tree. That is, qk(x) assigns each data x to a leaf node j of the k th tree; it is comprised of the decision rules on all the non-leaf nodes as its learnable parameters. In particular, on each non-leaf node j ∈ Nk (Nk is the set of all the non-leaf nodes on tree-k), the decision rule is defined by a choice of feature dimension pj ∈ [m] and a choice of threshold τj leading to the following rule: when xpj < τj , x is assigned to the left child node of j, otherwise x is assigned to the right child node.Therefore, to define a tree fk(·), we need to determine the structure of the tree T ≜ (Nk ∪ Lk, E) (E is the set of tree edges), the feature dimension pj and threshold τj associated with each non-leaf node j ∈ Nk, 3https://www.kaggle.com/competitions 4https://github.com/dmlc/xgboost 5https://github.com/Microsoft/LightGBM 3 Figure 1: An example of GBDT: Does the person like computer games? and the weight w k j associated with each leaf node j ∈ Lk. These comprise the learnable parameters θk of fk(·; θk), i.e., θk = T , wk , {(pj , τj )}j∈Nk. (6) To define an ensemble of multiple trees, we also need to know the number of trees K. We cannot directly apply gradient descent to learn the above parameters of GBDT because: (1) some of the above variables are discrete and some could have an exponential number of choices, including for example, the number of trees, the structure of each tree, the feature dimension choice associated with each non-leaf node, the weights at the leaf nodes; and 2) the overall decision tree process is not differentiable meaning straightforward naive gradient descent seems inapplicable.2.2 Overview of the GBDT algorithm The basic idea of GBDT training is additive training, or boosting. As mentioned above, Boosting is a metaalgorithm that trains multiple models one after another, and in the end combines additively to produce a prediction. Boosting often aims to convert multiple weak learners (which might be only slightly better than a random guess) into a strong learner that can achieve error arbitrarily close to zero. Boosting has many forms and instantiations, including AdaBoost [6], random forests [5, 2], gradient boosting [3, 4], etc. Note that Bagging [1] is not boosting since there is no interdependence between the trainings of the different models, rather each model in Bagging is trained on a separate bootstrap data sample.GBDT training shares ideas similar to coordinate descent in that only one part of the model is optimized at a time, while the other parts are fixed. In the coordinate descent algorithm (you implemented it for Lasso), each outer loop iteration requires one pass of all the feature dimensions. In each iteration of its inner loop, it starts from the first dimension, and optimizes only one dimension of the weight vector by fixing all the other dimensions and conditioning on all the other dimensions. Each tree in GBDT is analogous to a dimension in coordinate descent, but the optimization process is different. GBDT starts from the first tree, and optimizes only one tree per time by fixing all the previous trees and we condition on all the previously produced trees. One core difference GBDT training has from coordinate descent is that GBDT training does not have the outer loop associated with coordinate descent, i.e., it only does one pass over all the trees. If it was coordinate descent, we would optimize each coordinate only once in succession. In particular, we start from one tree, and only optimize one tree at a time conditioned on all previously produced trees. This, therefore, is a greedy strategy as we spoke about in class. After the training of one tree 4 finished, we add this new tree to our growing ensemble and then repeat the above process. The algorithm stops when we have added tmax trees to the ensemble. In the optimization of each tree, we start from a root node, find the best decision rule (a feature dimension and a threshold) and split the node into two children, go to each of the children, and recursively find the best decision rule on each of the child nodes, and continue until some stopping criterion is fulfilled (as will be explained very shortly below). In the following, we will first elaborate how to add trees one after the other, and then provide details regarding how to optimize a single tree based on a set of previous trees (which might be empty, and so this also explains how to start with the first tree). 2.3 Growing the forest: How to add a new tree? Assume that there will be K trees in the end. Therefore, we will get a sequence of (partially) aggregated predictions {yˆ k i } K k=1, from the K trees as follows: yˆ 0 i = 0, yˆ 1 i = f1(xi) = ˆy 0 i + f1(xi), yˆ 2 i = f1(xi) + f2(xi) = ˆy 1 i + f2(xi), · · · yˆ k i = X k k′=1 fk′ (xi) = ˆy k−1 i + fk(xi), · · · yˆ K i = X K k=1 fk(xi) = ˆy K−1 i + fK(xi). According to Eq. (4), fixing all the previous k − 1 trees, the objective used to optimize tree-k is Fk(θk) = Xn i=1 ℓ(yi , yˆ k i ) + Ω(θk) = Xn i=1 ℓ(yi , yˆ k−1 i + fk(xi)) + Ω(θk), (7) Let’s simplify the first term using Taylor’s expansion, ignoring higher-order terms: f(x + ∆) ≈ f(x) + f ′ (x)∆ + 1 2 f ′′∆2 . (8) After applying Taylor’s expansion to ℓ(yi , yˆ k−1 i + fk(xi)), we have ℓ(yi , yˆ k−1 i + fk(xi)) ≈ ℓ(yi , yˆ k−1 i ) + gifk(xi) + 1 2 hif 2 k (xi), (9) where gi and hi denote the first-order and second-order derivatives of ℓ(yi , yˆ k−1 i ) w.r.t. ˆy k−1 i , i.e., gi ≜ ∂ℓ(yi , yˆ k−1 i ) ∂yˆ k−1 i , hi ≜ ∂ 2 ℓ(yi , yˆ k−1 i ) (∂yˆ k−1 i ) 2 = ∂gi ∂yˆ k−1 i . (10) The second term Ω(θk) in Eq. (7) is a regularization term aiming to penalize the degree of complexity of tree-k. It depends on the number of leaf nodes, and the L2 regularization of w k . With GBDTs, it is defined as: Ω(θk) ≜ γ|Lk| + 1 2 λ∥w k ∥ 2 2 . (11) 5 We plugin Eq. (9) and Eq. (11) into Eq. (7), and after ignoring constants, we get Fk(θk) + const. ≈ Xn i=1 gifk(xi) + 1 2 hi · f 2 k (xi) + γ|Lk| + 1 2 λ∥w k ∥ 2 2 = Xn i=1 giw k qk(xi) + 1 2 hi · (w k qk(xi) ) 2 + γ|Lk| + 1 2 λ∥w k ∥ 2 2 = X j∈Lk     X i∈Ij gi   w k j + 1 2  λ + X i∈Ij hi   (w k j ) 2   + γ|Lk|, (12) where Ij represents the set of all the data points assigned to leaf j ∈ Lk, i.e., Ij ≜ {i ∈ [n] : qk(xi) = j}. Eq. (12) gives us the objective of optimizing a tree conditioned on all the previously achieved trees. 2.4 Growing a tree: How to optimize a single tree? Now we can start to optimize a single tree fk(·; θk). Look at the objective function in Eq. (12): it is a sum of |Lk| independent simple scalar quadratic functions of w k j for all the j ∈ Lk! How to minimize a quadratic function? This we know is easy, and similar to least square regression, the solution has a nice closed form. Hence, w k j minimizing Fk(θk) is w k j = − Gj Hj + λ , Gj ≜ X i∈Ij gi , Hj ≜ X i∈Ij hi . (13) We can plug the above optimal w k j into Eq. (12) and obtain an updated objective Fkθk; w k j = − Gj Hj + λ j∈Lk ! = γ|Lk| − 1 2 X j∈Lk G2 j Hj + λ (14) However, there are still two groups of unknown parameters in θk, which are the tree structure T and the decision rules {(pj , τj )}j∈Nk . In the following, we will elaborate how to learn these parameters by additive training of a single tree. We will start from the root node and determine the associated decision rule (pj , τj ); this rule should minimize the updated objective in Eq. (14), where Lk contains the left and right child of the root node. Then, the same process of determining (pj , τj ) will be recursively applied to the left and right nodes, until a stopping criteria (as described below) is fulfilled. Firstly, let’s determine the number of all the possible choices of (pj , τj ) for node j ∈ Nk. Since there are m features, we have m different choices for pj . For each feature dimension pj ∈ [m], we can sort all the nj ≜ |Ij | data points assigned to node j by their feature values on dimension pj , so there are at most nj choices 6 of thresholds τj when we based them on the n training data points. In particular, If the sequence of sorted feature values is (v1, v2, v3, . . . , vnj ), the t th choice of threshold can be any value between vt and vt+1. We usually use the middle point 1 2 (vt + vt+1) as the t th candidate of threshold. Therefore, there are m × nj choices for (pj , τj ), i.e., each of the m choices for pj has nj choices of threshold τj . For each candidate decision rule (pj , τj ), we can compute the improvement it brings to the objective Eq. (14). Before splitting node j to a left child j(L) and a right child j(R), the objective is Fk θk; w k jj∈Lk = γ|Lk| − 1 2 X j ′∈Lkj G2 j ′ Hj ′ + λ − 1 2 G2 j Hj + λ (15) After splitting, the leaf nodes change to j(L) and j(R), and the objective becomes Fk θk; w k j(L) , wk j(R) , w k j ′j ′∈Lkj = γ(|Lk| + 1) − 1 2 X j ′∈Lkj G2 j ′ Hj ′ + λ − 1 2 G2 j(L) Hj(L) + λ − 1 2 G2 j(R) Hj(R) + λ (16) 6 It can be less than nj if there are duplicated feature values on different data points 6 Hence, the improvement (we usually call it the “gain”) is Fk θk; w k jj∈Lk − Fk θk; w k j(L) , wk j(R) , w k j ′j ′∈Lkj (17) = 1 2 ” G2 j(L) Hj(L) + λ + G2 j(R) Hj(R) + λ − G2 j Hj + λ # − γ (18) = 1 2 ” G2 j(L) Hj(L) + λ + G2 j(R) Hj(R) + λ − (Gj(L) + Gj(R)) 2 Hj(L) + Hj(R) + λ # − γ. (19) Therefore, the best decision rule (pj , τj ) on node j ∈ Nk is the one (out of the m × n possible rules) maximizing the gain, which corresponds to the decision rule that minimizes the updated objective in Eq. (14). That is, we wish to perform the following optimization: max (pj ,τj ) 1 2 ” G2 j(L) Hj(L) + λ + G2 j(R) Hj(R) + λ − (Gj(L) + Gj(R)) 2 Hj(L) + Hj(R) + λ # − γ. (20) We start from the root node, apply the above criterion to find the best decision rule, split the root into two child nodes, and recursively apply the above criterion to find the decision rules on the child nodes, the grandchildren, and so on. We stop splitting according to a stopping criterion is satisfied. In particular, we stop to split a node if either of the following events happens: 1. the tree has reached a maximal depth dmax; 2. the improvement achieved by the best decision rule for the node (Eq. (20)) goes negative (or is still positive but falls below a small positive threshold, in the following experiments, you can try this, but please report results based on the “goes negative” criterion); 3. The number of samples assigned to the node is less than nmin. 2.5 Details of Practical Implementation 1. Learning rate η: You might notice that the tree growing in GBDT is a greedy process. In practice, to avoid overfitting on a single tree, and to give more chances to new trees, we will make the process less greedy. In particular, we usually assign a weight 0 ≤ η ≤ 1 to each newly added tree when aggregating its output with the outputs of previously added trees. Hence, the sequence at the beginning of Problem 2.3 becomes yˆ 0 i = 0, yˆ 1 i = ηf1(xi) = ˆy 0 i + ηf1(xi), yˆ 2 i = ηf1(xi) + ηf2(xi) = ˆy 1 i + ηf2(xi), · · · yˆ k i = η X k k′=1 fk′ (xi) = ˆy k−1 i + ηfk(xi), · · · yˆ K i = η X K k=1 fk(xi) = ˆy K−1 i + ηfK(xi). Note that this change must be applied in both training and during testing/inference. 0 ≤ η ≤ 1 is usually called the “learning rate” of GBDT, but it is not exactly the same as the variable we usually call the learning rate in gradient descent. 7 2. Initial prediction yˆ 0 i : GBDT does not have bias term b like linear model y = wx+b. Fortunately, ˆy 0 i plays a similar role as b. Hence, instead of starting from ˆy 0 i = 0, we start from ˆy 0 i = 1 n Pn i=1 yi , i.e., the average of ground truth on the training set. For classification, it is also fine to use this initialization (the average of lots of 1s and 0s), but do not forget to transfer the data type of label from “int” to “float” when computing the average in this case. 3. Choices of loss function ℓ(·, ·): ℓ(·, ·) is a sample-wise loss. In the experiments, you should use least square loss ℓ(y, yˆ) = (y−yˆ) 2 for regression problems. For binary classification problems, we use one-hot (0/1) encoding of labels y (y is either 0 or 1), and logistic regression (the GBDT output ˆy is the logit in this case, which is a real number and the input to logistic function producing class probability), i.e., ℓ(y, yˆ) = y log(1 + exp(−yˆ)) + (1 − y) log(1 + exp(ˆy)). (21) The prediction of binary logistic regression, which is the class probabilities, is Pr(class = 1) = 1 1 + exp(−yˆ) , Pr(class = 0) = 1 − Pr(class = 1). (22) To produce a one-hot (0/1) prediction, we apply a threshold of 0.5 to the probability, i.e., class = 1, Pr(class = 1) > 0.5 0, Pr(class = 1) ≤ 0.5 (23) 4. Hyper-parameters: There are six hyper-parameters in GBDT, i.e., λ and γ in regularization Ω(·), dmax and nmin in stopping criterion for optimizing single tree, maximal number of trees tmax in stopping criterion for growing forests, and learning rate η. We will not give you exact values for these hyper-parameters, since tuning them is an important skill in machine learning. Instead, we will give you ranges of them for you to tune. Note larger dmax and tmax require more computations. Their ranges are: λ ∈ [0, 10], γ ∈ [0, 1], dmax ∈ [2, 10], nmin ∈ [1, 50], tmax ∈ [5, 50], η ∈ [0.1, 1.0]. In RFs, we do not have the learning rate, but there is another hyper-parameter, which is the size m′ of the random subset of features, from which you need to find the best feature and the associated decision rule for a node. You can use m′ ∈ [0.2m, 0.5m]. 5. Stopping criteria: There are two types of stopping criteria needed to be used in GBDT/RFs training: 1) we stop to add new trees once we get tmax trees; and 2) we stop to grow a single tree once either of the three criteria given at the end of Problem 2.4 fulfills. 6. Acceleration: We encourage you to apply different acceleration methods after you make sure the code works correctly. You can use multiprocessing for acceleration, and it is effective. However, do not increase the number of threads to be too large. It will make it even slower. You can also try numba (a python compiler) with care. 7. Code template: We provide you a template of RF and GBDT implementation in a ipython notebook “GBDT.ipynb”. You can change the existing code in it for your convenience. The notebook also includes the code of loading the three datasets, so you do not need to download them manually. Please note that most code for regression and classification can be shared, and the only difference is the loss function (pred(), g() and h() in notebook), so you do not need to implement them twice: their difference is just a couple of lines and a loss function option argument for your GBDT class. 2.6 Questions 1. [4 points] What is the computational complexity of optimizing a tree of depth d in terms of m and n? 2. [4 points] What operation requires the most expensive computation in GBDT training? Can you suggest a method to improve the efficiency (please do not suggest parallel or distributed computing here since we will discuss it in the next question)? Please give a short description of your method. 8 3. [8 points] Which parts of GBDT training can be computed in parallel? Briefly describe your solution, and use it in your implementation. (Hint: you might need to use “from multiprocessing import Pool” and “from functools import partial”. We also talked about multiprocessing in the recitation session.) 4. [20 points] Implement GBDT for the regression task, and test its performance on Boston house price dataset used in Homework 2. Report the training and test RMSE. How is the performance of GBDT compared to least square regression and ridge regression? 5. [20 points] Implement GBDT for the binary classification task, and test its performance on Creditg dataset. Report the training and test accuracy. Try your implementation on the breast cancer diagnostic dataset, and report the training and test accuracy. 6. [4 points] According to the results on the three experiments, how is the performance of random forests compared to GBDT? Can you give some explanations? References [1] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. [2] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. [3] Jerome H. Friedman. Stochastic gradient boosting. Computational Statistics and Data Analysis, 38:367– 378, 1999. [4] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189–1232, 2000. [5] Tin Kam Ho. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, volume 1, pages 278–282, 1995. [6] Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.

[SOLVED] Csci-shu360 machine learning homework 3

In this problem, you will implement the gradient descent algorithm and mini-batch stochastic gradient descent algorithm for multi-class logistic regression from scratch (which means that you cannot use builtin logistic regression modules in scikit-learn or any other packages). Then you will be asked to try your implementation on a hand-written digits dataset to recognize the hand-written digits in given images. We provide an ipython notebook “logistic regression digits.ipynb” for you to complete. You need to complete the scripts below “TODO” (please search for every “TODO”), and submit the completed ipynb file. In your write-up, you also need to include the plots and answers to the questions required in this session. In this problem, you will implement a logistic regression model for multi-class classification. Given features of a sample x, a multi-class logistic regression produces class probability (i.e., the probability of the sample belonging to class k) Pr(y = k|x; W, b) = exp(xWk + bk) Pc j=1 exp(xWj + bj ) = exp(zk) Pc j=1 exp(zj ) , ∀ k = 1, 2, · · · , c (1) 1 where c is the number of all possible classes, model parameter W is a d × c matrix and b is a c-dim vector, and we call the c values in z = xW +b as the c logits associated with the c classes. Binary logistic regression model (c = 2) is a special case of the above multi-class logistic regression model (you can figure out why by yourself with a few derivations). The predicted class ˆy of x is yˆ = arg max k=1,2,··· ,c Pr(y = k|x; W, b). (2) For simplicity of implementation, we can extend each x by adding an extra dimension with value 1, i.e., x ← [x, 1], and accordingly add an extra row to W, i.e, W ← [W; b]. After using the extended representation of x and W, the logits z become z = xW. Logistic regression solves the following optimization for maximum likelihood estimation. min W F(W) where F(W) = 1 n Xn i=1 − log[Pr(y = yi |x = Xi ; W)] + η 2 ∥W∥ 2 F , (3) where we use a regularization similar to the ℓ2-norm in ridge regression, i.e., the Frobenius norm ∥W∥ 2 F = Pc j=1 ∥W·,j∥ 2 2 = Pc j=1 Pd i=1(Wi,j ) 2 . Note that W·,j is the j th column of W. 1. [15 points] Derive the gradient of F(W) w.r.t. W, i.e., ∇W F(W), and write down the gradient descent rule for W. Hint: compute the gradient w.r.t. to each Wj for every class j first. 2. [10 points] Below “2 batch gradient descent (GD) for Logistic regression”, implement the batch gradient descent algorithm with a constant learning rate. To avoid numerical problems when computing the exponential in the probability Pr(y = k|x; W, b), you can use a modification of the logits z ′ , i.e., z ′ = z − max j zj . (4) Explain why such a modification could avoid potential numerical problems and show that the overall result is unchanged by applying such a trick. When the change of objective F(W) comparing to F(W) in the previous iteration is less than ϵ = 1.0e − 4, i.e., |Ft(W) − Ft−1(W)| ≤ ϵ, stop the algorithm. Please record the value of F(W) after each iteration of gradient descent. Please run the implemented algorithm to train a logistic regression model on the randomly split training set. We recommend to use η = 0.1. Try three different learning rates [5.0e − 3, 1.0e − 2, 5.0e − 2], report the final value of F(W) and training/test accuracy in these three cases, and draw the three convergence curves (i.e., Ft(W) vs. iteration t) in a 2D plot. 3. [5 points] Compare the convergence curves: what is the advantages and disadvantages of large and small learning rates? 4. [5 points] Below “4 stochastic gradient descent (SGD) for Logistic regression”, implement the minibatch stochastic gradient descent (SGD) for logistic regression. You can reuse some code from the previous gradient descent implementation. Tuning hyper-parameters is critical to get good models. For the mini-batch SGD algorithm, the hyperparameters consist of 1) the initial learning rate, 2) the learning rate schedule, or how do we change the learning rate over time, 3) the number of epochs to train (an epoch corresponds to training the model with every data point once; though theoretically SGD samples data points with replacement, in practice, the widely adopted approach is to random shuffle the dataset to form mini-batches, let the model learn from all data points once, and repeat until convergence), and 4) the mini-batch size. We suggest using the following automatic Tuning for the learning rate schedule and number of epochs. For the learning rate schedule, record the objective F(W) over epochs, and if the objective has not become better than 10 epochs ago, half the learning rate. For the number of epochs, or stopping 2 criteria, continue training until F(W) has not get better than 20 epochs ago. You can start by using an initial learning rate of 1.0e−2 and a mini-batch size of 100 for this problem. You can discard the last mini-batch of every epoch if it is not full. Please remember to record the value of F(W) after each epoch and the final training and test accuracy. Run your code for different mini-batch sizes: [10, 50, 100]. Report the final value of F(W) and final training/test accuracy, and draw the three convergence curves (Ft(W) vs. epoch t) in a 2D plot. 5. [5 points] Compare the convergence curves: do they (logistic regression with the three different batch sizes) show the same convergence speed, when the same initial learning rate is used? For different batch sizes, you may need to tune the initial learning rate. In general, the rule of thumb is to scale the learning rate linearly with the batch size. Please draw the new convergence curves after tuning the learning rate in a 2D plot. What is the difference compared to the old convergence curves? Can you give some mathematical explanations based on the SGD you implemented?Before you start: This problem could be time-consuming and please start early! It may take several hours to train your model depending on your implementation. We provide an ipython notebook “lasso.ipynb” for you to complete. In your terminal, please go to the directory where this ipynb file is located, and run command “jupyter notebook”. A local webpage will be automatically opened in your web browser. Click the above file to open the notebook. You may also use an IDE such as vscode. You need to complete everything below each of the “TODO”s that you find (please search for every “TODO”). Once you have done that, please submit the completed ipynb file as part of your included .zip file. In your write-up, you also need to include the plots and answers to the questions required in this session. Please include the plots and answers in the pdf file in your solution. Recall that for lasso, we aim to solve: argminθ,θ0 F(θ, θ0) where F(θ, θ0) = 1 2 Xn i=1 (⟨x (i) , θ⟩ + θ0 − y (i) ) 2 + λ Xm j=1 |θj |. (5) where λ is the hyperparameter to control the regularization. Here, x (i) is an m-dimensional data point with the corresponding label y (i) , θ is the weight vector of m dimensions and θ0 is a scalar offset. Note that the objective is slightly different from the one given in the lecture, with an extra 1/2 in front of the MSE. Remarks: Do not include θ0 in the computation of precision/recall/sparsity. However, do not forget to include it when you compute the prediction produced by the lasso model, because it is one part of the model. 1. [15 points] Implement the coordinate descent algorithm to solve the lasso problem in the notebook. We provide a function “DataGenerator” to generate synthetic data in the notebook. Please read the details of the function to understand how the data are generated. In this problem, you need to use n = 50, m = 75, σ = 1.0, θ0 = 0.0 as input augments to the data generator. Do not change the random seed for all the problems afterwards. Stopping criteria of the outer loop: stop the algorithm when either of the following is fulfilled: 1) the number of steps exceeds 100; or 2) no element in θ changes more than ϵ between two successive iterations of the outer loop, i.e., maxj |θj (t)−θj (t−1)| ≤ ϵ, where the recommended value for ϵ = 0.01, where θj (t) is the value of θj after t iterations. At the beginning of the lasso, use the given initialization function “θ, θ0 = Initialw(X, y)” to initialize θ and θ0 by the least square regression or ridge regression. 3 You can try different values of λ to make sure that your solution makes sense to you (Hint: DataGenerator gives the true θ and θ0). Solve lasso on the generated synthetic data using the given parameters and report indices of non-zero weight entries. Plot the objective value F(θ, θ0) v.s. coordinate descent step. The objective value should always be non-increasing. 2. [5 points] Implement an evaluation function in the notebook to calculate the precision and recall of the non-zero indices of the lasso solution with respect to the non-zero indices of the true vector that generates the synthetic data. Precision and recall are useful metrics for many machine learning tasks. For this problem in specific, precision = |{non-zero indices in ˆθ} ∩ {non-zero indices in θ ∗}| |{non-zero indices in ˆθ}| ; (6) recall = |{non-zero indices in ˆθ} ∩ {non-zero indices in θ ∗}| |{non-zero indices in θ ∗}| , (7) where θ ∗ is the θ in true model weight, while ˆθ is the θ in lasso solution. You also need to report the sparsity (the number of nonzero entries) of ˆθ, and the RMSE of the training data (you may check the definition from HW2 Prob 5). Note that a solution can have high precision with low recall (e.g., the solution contains only one correct non-zero index while the true vector contains many) and vice versa. Report the precision and recall of your lasso solution from the previous problem. 3. [10 points] Vary λ and solve the lasso problem multiple times. Choose 50 evenly spaced λ values starting with λmax = ∥(y − y¯)X∥∞ (¯y is the average of elements in y, and ∥a∥∞ = maxj aj ), and ending with λmin = 0. Plot the precision v.s. λ and recall v.s. λ curves on a single 2D plot. Briefly explain the plotted pattern and curves. On top of this, try to have fun with λ and play with this hyperparameter, explore, discover, and tell us what you have discovered. Draw a “lasso solution path” for each entry of θ in a 2D plot. In particular, use λ as the x-axis, for each entry θi in θ achieved by lasso, plot the curve of θi vs. λ for all the values of λ you tried similar to the plot we showed in class from the Murphy text (in your case, there are 50 points on the curve). Draw such curves for all the m entries in θ within a 2D plot, use the same color for the 5 features in DataGenerator used to generate the data, and use another very noticeably distinct color for other features. If necessary, set a proper ranges for x-axis and y-axis, so you can see sufficient detail. Now change the noise’s standard deviation σ = 10.0 when using “DataGenerator” to generate synthetic data, draw the lasso solution path again. Compare the two solution path plots with different σ, and explain their difference. Be complete, and clear.4. [5 points] Use the synthetic data generation code with different parameters: (n = 50, m = 75),(n = 50, m = 150),(n = 50, m = 1000),(n = 100, m = 75),(n = 100, m = 150),(n = 100, m = 1000) (keeping other parameters the same as in Sub-Problem 1). Vary λ in the same way as in the previous question (Sub-Problem 3), and find the λ value that can generate both good precision and recall for each set of synthetic data points. For each case, draw the “lasso solution path” defined in Sub-Problem 3.5. [25 points] This question is challenging, requiring major changes to your previous implementation as well as significant training time. Run lasso to predict review stars on Yelp by selecting important features (words) from review comments. We provide the data in hw3 data.zip. You can unzip the file and use the provided function “DataParser” in the notebook to load the data. There are three files: “star data.mtx”, “star labels.txt”, “star features.txt”. The first file stores a matrix market matrix and DataParser reads it into a scipy csc sparse matrix, which is your data matrix X. The second file contains the labels, which are the stars of comments on Yelp, is your y. The third file contains the names of features (words). For the last two txt files, you can open them in an editor and take a look at their contents.The sparse data X has size 45000 × 2500, and is split into the training set (the first 30000 samples), validation set (the following 5000 samples) and the test set (the last 10000 samples) by “DataParse”. Each column corresponds to a feature, i.e., a word appearing in the comments. Your mission is to solve lasso on the training set, tune the λ value to find the best RMSE on the validation set, and evaluate the performance of the obtained lasso model on the test set.Important to read before you start: Here, we are dealing with a sparse data matrix. Most numpy operations for dense matrices you used for implementing lasso in Sub-Problem 1 cannot be directly applied to sparse matrices here. You can still use the framework you got in Sub-Problem 1, but you need to replace some dense matrix operations (multiply, dot, sum, slicing, etc.) by using sparse matrix operations from “scipy.sparse” (please refer to https://docs.scipy.org/doc/scipy/ reference/sparse.html for details of sparse matrix operations).The sparse matrix format here aims to help you make the algorithm more efficient in handling sparse data. Do not try to directly transform the sparse matrix X to a dense one by using “X.todense()”, since it will waste too much memory. Instead, try to explore the advantages of different sparse matrix types (csc, coo, csr) and avoid their disadvantages, which are listed under each sparse matrix type in the above link. You can change the format of a sparse matrix X to another one by using (for example) “X.tocsc()” if necessary, but do not use it too often. For some special sparse matrix operations, it might be more efficient to write it by yourself. We provide an example “cscMatInplaceEleMultEveryRow” in the notebook. You can use it, or modify it for your own purpose.This will be a good practice for you to think about how to write an efficient ML algorithm. Try to avoid building new objects inside the loop, or computing anything from scratch. You can initialize them before the loop start, and use the lightest way to update them in the loop. Note any operation that seems “small” inside the loop could possibly lead to expensive computations, considering the total number of times it will be executed.You can use “if” to avoid unnecessary operations on zeros. Do not loop over matrices or vectors if not necessary: use matrix or batch operations provided by numpy or scipy instead. Try to use the most efficient operation when there are many choices reaching the same result. If you write an inefficient code here, running it will take extremely longer time. During debugging, timing each step or operation in the loop will help you figure out which step takes longer time, and you can then focus on how to accelerate it.Run this experiment may take 1-3 hours, depending on your implementation, please start early. Before you leave it running by itself, make sure that the timing results indicate a reasonable finishing time, and remember to save intermediate results to avoid restarting from the very beginning caused by program crashes.You need to finish: • Explain how do you modify your implementation to make the code more efficient compared to the “naive” implementation you did for Sub-Problem 1. Compare the computational costs of every coordinate descent iteration in terms of n (number of data points) and m (number of dimensions). • Plot the training RMSE (on the training set) v.s. λ values and validation RMSE (on the validation set) v.s. λ values on a 2D plot. Use the definition of λmax in sub-problem 3 and run experiments on multiple values of λ. You can reduce the number of different λ values to 20. You can also increase the minimal λ to be slightly larger than 0 such that 0 ≤ λmin ≤ 0.1λmax. These two changes will save you some time. • Plot the lasso solution path defined in Sub-Problem 3. • Report the best λ value achieving the smallest validation RMSE you find on the validation set, and report the corresponding test RMSE (on the test set). • Report the top-10 features (words in comments) with the largest magnitude in the lasso solution w when using the best λ value, and briefly explain if/why they are meaningful.

[SOLVED] Csci-shu 360 machine learning homework 1

Imagine you and a friend are playing the slot machine in a casino. Having played on two separate machines for a while, you decide to swap machines between yourselves, and measure for differences in “luck”. The wins/losses for you and your friend for each machine are tabulated below. Machine 1 Wins Losses You 40 60 Friend 30 70 Machine 2 Wins Losses You 210 830 Friend 14 70Assuming that the outcomes of playing the slot machine are independent of their history and that of the other machine, answer the following questions. 1. (3 points) Estimate the winning probability of you and your friend for each of the machines. Compare your winning probability with your friend’s on different machines. Who is more likely to win? 2. (3 points) Estimate the overall winning probability of you and your friend in the casino (assume that there are only two slot machines in the casino). Who is more likely to win? 3. (4 points) Compare your conclusions from (1) and (2). Can you explain them mathematically? I.e., write down the equations to compute the winning probabilities and compare the terms. 1 Figure 1: Problem 2On the 2D plane, we have four vectors (all with the starting point on the origin) a1 = (1, 1), a2 = (1, −1), b1 = (−0.8, 1.6) and b2 = (2.6, −0.2). See Fig. 1 1. [5 points] Write down one 2 × 2 matrix W that transforms a1 to b1, and a2 to b2. 2. [5 points] Write down one rotation matrix V , which rotates clockwise by α degrees such that tan(α) = 3. Then, write down one matrix Σ, which scales the y-axis by 2, while keeping the x-axis unchanged. Finally, write down one rotation matrix U, which rotates counter-clockwise by β degrees, such that tan(β) = 1/3. Multiply three matrices together, namely UΣV , what do you discover?3. [10 points] Compute the eigenvalues and the corresponding eigenvectors of WT W. Now consider the unit circle. Suppose every point on the unit circle gets transformed by W; what shape do you get after the transformation (consider the break-up transformations in the previous question)? What relationship do you find between the eigenvalues/eigenvectors and the transformed shape, and why? 4. [5 points] Compute the determinant of W. What’s the area of the shape transformed from the unit circle? What is the relationship between the determinant and the area of a transformed shape? Based on that, can you use one sentence to explain that the determinant of a product of two matrices is equal to the product of the determinants of the two matrices or, in other words, det(AB) = det(A)det(B)?1. [3 points] For a discrete random variable X of finite values, show that (E[X3 ])2 ≤ E[X6 ]. (1) You only need the materials taught in class. 2. [3 points] Prove the above in a different way. Again, you only need the materials taught in class. 3. [4 points] For two PSD matrices A, B both of shape n × n, show that λA + (1 − λ)B is also PSD for 0 ≤ λ ≤ 1.General instructions for programming problems (if you are not familiar with python): please install anaconda python (python version 3.x, and 3.8 or above is recommended) by following the instructions on https: //www.anaconda.com/download/, and then install sklearn, numpy, matplotlib, seaborn, pandas and jupyter notebook in anaconda, for example, by running command “conda install seaborn”. Note some of the above packages may have already been installed in anaconda, depending on which version of anaconda you just installed.Please check the following tutorials if you are not familiar with numpy: http://cs231n.github.io/ python-numpy-tutorial/ and https://docs.scipy.org/doc/numpy-1.15.0/user/quickstart.html In this problem, you will estimate a multivariate Gaussian distribution from 2D points sampled a multivariate Gaussian distribution, and learn how to use python and matplotlib to generate simple plots in ipython notebook. If you are not familiar with jupyter notebook, please check a quick tutorial on https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html. We provide an ipython notebook “2d gaussian.ipynb” for you to complete. In your terminal, please go to the directory where this ipynb file is located, and run command “jupyter notebook”. A local webpage will be automatically opened in your web browser. Click the above file to open the notebook. You need to complete everything below each of the “TODO”s that you find (please search for every “TODO”). Once you have done that, please submit the completed ipynb file as part of your included .zip file. In your write-up, you also need to include the plots and answers to the questions required in this session.Please include the plots and answers in the pdf file in your solution. 1. [2 points] Run the first cell in the ipython notebook to generate 1000 points X (a 1000×2 matrix, each point has two coordinates) sampled from a multivariate Gaussian distribution. Estimate the mean and covariance of the sampled points and report them in your write-up. 2. [3 points] Plot the histogram for the x-coordinates of X and y-coordinates of X respectively. You can use the plt.hist() function from matplotlib.3. [5 points] From the histogram, can you tell whether the x-coordinates of the 1000 points (the first column of X) follow some Gaussian distribution? If so, compute the mean and the variance. How about the y-coordinates? If so, compute the mean and the variance. 4. [5 points] Sample 1000 numbers from a 1D Gaussian distribution with the mean and variance of the x-coordinates you got from subproblem 3. Sample another 1000 numbers from a 1D Gaussian distribution with the mean and variance of the y-coordinates. You can use np.random.normal from numpy. Generate a new 2D scatter plot of 1000 points with the first 1000 numbers as x-coordinates and the second 1000 numbers as y-coordinates. Compared to the original 1000 points, what is the difference in their distributions? What causes the difference?5. [5 points] Plot a line segment with x = [−10, 10] and y = 2x + 2 on the 2D-Gaussian plot. The np.linspace() function may be helpful. Project X onto line y = 2x + 2 and plot the projected points on the 2D space. 6. [5 points] Draw the histogram of the x-coordinates of the projected points. Are the x-coordinates of the projected points sampled from some Gaussian distribution? If so, compute the mean and the variance.For this problem, we want to use the K nearest neighbor algorithm to classify Iris flowers. 1. [5 Points] Load the Iris data using sklearn.datasets. Calculate how many elements there are for every class. 2. [5 Points] Build a KNeighborsClassifier with k = 1 to predict the class. Train it on the whole dataset. For the classification problem, different goodness-of-fit metrics are used. For this exercise, you can use accuracy, which is defined in the formula given below. Calculate the accuracy of the KNN classifier on the iris dataset. Does this result give you meaningful information? Accuracy = #correctly predicted M (2)3. [5 Points] Split the dataset into two parts using sklearn’s train test split. Use the following arguments: (a) X, y : the dataset (b) test size = 0.5 (use 50% of the dataset for testing) (c) shuffle = True (randomly shuffle the dataset before making a cut) (d) random state = 0 (random seed, this ensures consistent results) Use the split to find the optimal value of k. Please try different values of k from 1 to 50, fit the model on the training data, and calculate the model’s accuracy on the training data and the testing data, respectively. Plot the training accuracy and testing accuracy against the value of k. Which k value is the best?4. [5 Points] You observed a flower and measured the following characteristics: (a) sepal width = 5.0 (b) petal width = 4.1 (c) sepal length = 3.8 (d) petal length = 1.2 Use your prediction model to classify this plant. What’s the predicted class?For this question, use the data in clust data.csv. We will attempt to cluster the data using k-means. But, what k should we use? 1. [5 Points] Apply k-means to this data 15 times, changing the number of centers from 1 to 15. Each time use nstart = 10 and store the within-cluster sum-of-squares/inertia value from the resulting object. The inertia measures how variable the observations are within a cluster. This value will be lower with more centers, no matter how many clusters there truly are naturally in the given data. Plot this value against the number of centers. Look for an “elbow”, the number of centers where the improvement suddenly drops off. Based on this plot, how many clusters do you think should be used for this data? 2. [2 Points] Re-apply k-means for your chosen number of centers. How many observations are placed in each cluster? What is the value of inertia? 3. [3 Points] Visualize this data. Plot the data using the first two variables and color the points according to the k-means clustering. Based on this plot, do you think you made a good choice for the number of centers? Briefly explain your conclusion.

[SOLVED] Csci-shu 360 machine learning homework 2

Show detailed derivations that the linear regression loss function is convex in the parameters w: L(w) = ||y − Xw||2 2 . Here y is an n-dimensional vector, X is an n × d matrix and w is a d-dimensional vector. X is of full rank. There are multiple ways to show it, but we force you to take the following approach: compute ∇L(w) using directional derivative and then compute ∇2L(w).In this problem, we will investigate the Gaussian distribution in high dimensional space, and develop intuitions and awareness about the curse of dimensionality, a critical concept that everyone who wishes to pursue study in machine learning should understand. For a random variable x of m dimensions (i.e., x ∈ R m) drawn from a multivariate Gaussian distribution, recall that the Gaussian density function takes the form: p(x) = 1 |2πC| 1/2 exp − 1 2 (x − µ) T C −1 (x − µ) (1) where µ ∈ R m is an m-dimensional mean vector and C ∈ R m×m is an order m × m symmetric positive definite covariance matrix. When C is a diagonal matrix, then the covariance between different dimensions is zero, which is known as a axis-aligned Gaussian, and when C = σ 2 I, I being the m × m identity matrix, 1 we already saw this in Homework 1 where the m = 2 (2D) Gaussian in this case has spherical (circle) shaped contours. A spherical Gaussian in m dimensions thus has the following equation form: p(x) = 1 (2πσ2)m/2 exp − ∥x∥ 2 2σ 2 (2) We start by examining some basic geometric properties of the sphere in m dimensional space. A sphere is generally a collection of points such that the distance of any point to the center of the sphere (we always center the sphere on origin ⃗0 for simplicity) is equal to r, the radius. In other words, we define an mdimensional sphere as Sm−1(r) = {x ∈ R m : ∥x∥2 = r}, the set of points in m-dimensional space that are distance r from the origin (note that Sm−1(r) is the equation for the surface of the sphere, although in some fields, such as physics, they define the sphere as the surface and interior, as in {x ∈ R m : ∥x∥2 ≤ r}). We also use Vm(r) for the volume of an m-dimensional sphere. Sm−1(r) represents the surface area of the m-dimensional sphere (meaning, e.g., that m = 2 dimensional sphere of radius r has surface area S1(r), this convention is used since the surface area is a curved m − 1 dimensional manifold embedded in m-dimensional ambient space).Please make sure you answer every question: 1. [1 points] Before we move to m dimensions, let’s talk about 2D and 3D cases. Write down the equations, in terms of the radius r, of Sm−1(r) for m ∈ {2, 3} and Vm(r) for m ∈ {2, 3}. 2. [5 points] Intuitively explain the following equation: Sm−1(r) = d dr Vm(r). (3) Why does this equation make sense and why should it be true? You may help to convince yourself and improve intuition, by verifying the equations of S1(r), V2(r), S2(r) and V3(r) from the previous question. 3. [4 points] As you may have guessed, Vm(r)’s only dependence on r and m is via m’s power of r, or specifically r m. Suppose for a unit sphere (r = 1) in m dimensional space, the surface area is S¯m−1. Write Sm−1(r) in terms of r and S¯m−1.4. [5 points] Now consider all the points on the sphere Sm−1(r) (which, because of our definition of Sm−1(r) really means the surface). We wish to integrate over all of those points weighted by the Gaussian probability density p(x) of each point, where p(x) is defined as given in Equation (2). That is, we integrate over all points points x ∈ Sm−1(r) weighted by p(x). Indeed, this is an integration, but you can avoid doing the mathematical integration by utilizing the results from previous questions. Write the equation ρm(r) for the integrated density of sampled points from the Gaussian distribution lying on the surface of Sm−1(r). 5. [5 points] For large m, show that ρm(r) has a single maximum value at ˆr such that ˆr ≈ √ mσ. 6. [10 points] For large m, consider a small value ϵ ≪ rˆ (the symbol “≪” means “much less than”), and show that ρ(ˆr + ϵ) ≈ ρ(ˆr)e − ϵ 2 σ2 . (4) Hint: during your derivation, first get the expression simplified and close to the desired form. Then use Taylor expansion to get the approximation. 7. [3 points] The previous problem shows that ˆr is the radius where most of the probability mass is in a high dimension Gaussian, and moreover, as we move away from this radius, say going from ˆr to rˆ+ ϵ, then the total mass becomes smaller exponentially quickly in ϵ. Also, note that since ˆr ≈ √ mσ, for large m we have σ ≪ rˆ — since σ (the standard deviation) is in low dimensions usually used to indicate where much of the probability mass is, indeed p(x) > p(x ′ ) whenever ∥x∥ < ∥x ′∥ with the highest density value being p(0) at the origin 0. When we get to high dimensions, however, most of the mass is far away from the σ neighborhood around the origin.Taken together, this means that most 2 of the probability mass, in a high dimensional Gaussian, is concentration in a thin skin (e.g., think of the skin of an m-dimensional apple or synthetic leather layer of an m-dimensional soccer ball) of large radius. If we only get a finite number of samples from a high-dimensional Gaussian distribution, therefore, where do most of the points reside? At what radius do they reside? For a low dimensional Gaussian distribution, where do most points reside? 8. [7 points] The conclusion from the previous questions may seem highly counterintuitive, but so can be the curse of dimensionality. Calculate and compare the probability density at the origin and at one point on sphere Sm−1(ˆr). The curse of dimensionality comes from the extremely high growth rate of volume as the dimensionality of the space increases (there’s just a lot of room in high dimensions). Write a python script that samples from an m-dimensional Gaussian. For each m ∈ {1, 2, . . . , 40}, produce 100 samples and compute the mean and standard deviation of the radii of the samples, and plot this as a function of m. Is your plot consistent with the above? Why or why not? Fully understand what you see, and clearly explain to us that you understand it and how you justify it. 3 Ridge Regression [20 points] Recall that linear regression solves min w ∥Xw − y∥ 2 2 , (5) Where X is the data matrix, and every row of X corresponds to a data point, y refers to the vector of labels, and w is the weight vector we aim to optimize. Specifically, X is an n × d data matrix with n data samples (rows) and d features (columns), and y is an n-dimensional column vector of labels, one for each sample. Ridge regression is very similar, and it is defined as min w ∥Xw − y∥ 2 2 + η 2 ∥w∥ 2 2 , (6) where we add an additional regularization term η 2 ∥w∥ 2 2 to encourage the weights to be small and has other benefits as well which we will discuss in class.Please make sure you answer every question 1. [5 points] Describe (with drawings and intuitive description) one setting for (X, y), where standard linear regression is preferred over ridge regression. The drawing should show: (1) the data points (X, y); (2) the expected linear regression solution (e.g., a line); (3) expected ridge regression solution (also, e.g., a line). You need to explain the reason for why the standard linear regression is preferred. You need not do any actual calculation here. 2. [5 points] Describe (with drawings and intuitive description) one setting for (X, y), where ridge regression is preferable to linear regression. Your answer should fulfill the same requirements in part 1. 3. [5 points] Solve for the closed-form solution for ridge regression. To get the closed-form solution, you can set the gradient of the objective function F(w) in the above minimization problem to be zero and solve this equation of w. If you are not familiar with how to compute the gradient (or such derivatives), please refer to Section 2.4 of the Matrix Cookbook (https://www.math.uwaterloo.ca/~hwolkowi/ matrixcookbook.pdf) The Matrix Cookbook is extremely helpful for linear algebra. 4. [5 points] Now consider two cases: (1) where the columns (or features) of X are more than the rows (or samples); and (2) where the columns (or features) of X are highly correlated (an extreme case is where many features are identical to each other). For each of the above: (a) Can you still compute the closed-form solution of the vanilla linear regression? (b) With the closed form solution of the previous question and compared to the solution for standard linear regression, do you discover other benefits of ridge regression? 34 Bonus: Locality Sensitive Hashing (LSH) [20 points] We haven’t talked about LSH in detail in class, and this problem serves as a tutorial for LSH. This problem requires a lot of reading and thinking, though the math involved is not particularly hard. As this is a bonus problem, we suggest you leave it to the end if you have extra time and are interested in the topic. Locality sensitive hashing refers to a special property of hash functions and is closely related to approximate nearest neighbor search (NNS), i.e., we find close points to the query instead of exactly the nearest neighbor. For simplicity, suppose the design matrix X (which is n×m) consists of only binary features. This means that every data point (i.e., every row of the design matrix) has the form xi ∈ {0, 1} m. Define d(xi , xj ) as the hamming distance between two data points xi and xj , i.e. d(xi , xj ) = P a∈{0,1,…,m−1} |xi [a] − xj [a]|. Hamming distance simply counts the number of positions where the two binary vectors differ, and hamming distance is a proper distance metric (see https://en.wikipedia.org/wiki/Metric_(mathematics) for the full definition of a distance metric). 1. [5 points] Imagine you have the following magical oracle. Given a query point q ∈ {0, 1} m, and two parameters r > 0 and c ≥ 1, (a) If ∃x ∈ X such that d(x, q) ≤ r, the oracle returns to you some point x ′ ∈ X such that d(x ′ , q) ≤ cr. (b) If ∄x ∈ X such that d(x, q) ≤ cr, the oracle returns to you nothing. (c) Otherwise (∃x ∈ X such that d(x, q) ≤ cr but ∄x ∈ X such that d(x, q) ≤ r), the oracle is not stable, and can either return to you some point x ′ such that d(x ′ , q) ≤ cr or return to you nothing. Suppose you want to find exactly the nearest neighbor point in X of the query point q. Using as few times as possible of calling the oracle, how do you appropriately set the values r and c in order to get back the nearest neighbor of q? 2. [2 points] Unfortunately, the magical oracle is not free. Suppose we set r to be slightly larger than the distance to the nearest neighbor, i.e. r = minx∈X d(x, q) + ϵ, if we set a very large c, the oracle may return to us any data point, which is cheap but not useful. However, if we set a small c, the oracle becomes expensive and returns to us a point with distance at most cr, which may serve as a good approximation to the true nearest neighbor. By setting values of c, we are controlling the trade-off between the quality of the approximate nearest neighbor and the oracle’s running time. We will then show how to implement the oracle using locality sensitive hashing functions. Consider a family of hash functions H, where each function h is associated with a random integer a from {0, 1, . . . , m − 1}, and given the input xi , h(xi) = xi [a].This hash function is very simple, and merely returns coordinate a of the data point xi . Suppose we randomly pick a hash function h from H. For two inputs xi and xj , if d(xi , xj ) ≤ r (r > 0), what’s a lower bound for the probability that h maps xi and xj to the same value (i.e. P r(h(xi) = h(xj )))? We name this lower bound as p1. Similarly, if d(xi , xj ) ≥ cr (c ≥ 1), what’s an upper bound for the probability that h maps xi and xj to the same value (i.e. P r(h(xi) = h(xj )))? We name this upper bound as p2. 3. [2 points] LSH refers to the following properties. A family of hash functions is (r, c, p1, p2)-LSH (1 ≥ p1 ≥ p2 > 0, c > 1, r > 0) if: (a) P r(h(xi) = h(xj )) ≥ p1 when d(xi , xj ) ≤ r; (b) P r(h(xi) = h(xj )) ≤ p2 when d(xi , xj ) ≥ cr. Note h is a randomly sampled hash function from the family of hash functions. Intuitively, when two data points are close (d(xi , xj ) ≤ r), the hash function should map them to the same output value with (relatively) high probability at least equal to p1. If two data points are faraway (d(xi , xj ) ≥ cr), the hash function should map them to the same output value with (relatively) low probability at most equal to p2. 4 The very simple hash functions from H introduced above indeed satisfies the LSH property (you can verify by comparing your answers to the two previous questions). With the dataset X, we can first hash all data points using a single function h randomly sampled from H. Then, given a query point q, we hash q with the same function h, and retrieve all data points in X that get hashed to the same value as h(q) (if any). Finally, we can iterate over the retrieved points (if not empty), and return the point closest to the query point q. As p1 ≥ p2, the hash function is more likely to hash close points into the same value than faraway points.However, the difference between p1 and p2 might be not significant enough. One simple trick to make close points more probable relative to the faraway points is to sample multiple hash functions from H, and two data points have the same hashed value if all the sampled hash functions give the same value. In other words, we define a new hash function g via the use of k randomly sampled hash functions from H, and we do this by concatenating their output together into a vector of length k. g(xi) = (h1(xi), h2(xi), h3(xi), . . . , hk(xi)). (7) Give a lower bound for the probability that g(xi) = g(xj ) if d(xi , xj ) ≤ r in terms of p1 and k. Give an upper bound for the probability that g(xi) = g(xj ) if d(xi , xj ) ≥ cr in terms of p2 and k. (Note that g(xi) = g(xj ) if they are equal on every coordinate of the output vectors. You can also think the binary vector output of g as a binary encoding of an integer, so the output of g becomes an integer value.) 4. [2 points] As we increase the value of k, the close data points are more probable relative to the faraway data points. However, we also get lower probability to have two data points hashed to the same value. For large value of k, the probability can be negligible, no two points may get hashed to the same value, and as a result, our algorithm may always return nothing.To alleviate such problems, we may have l instances of g functions, where each g has independent k sampled hash functions from H. Then we hash q with every g function, and collect all data points (if any) that share the same hashed value as g(q). Finally, we iterate over the collected data points from all g functions, and return the one with the least distance. Give a lower bound for he probability that ∃b ∈ {0, 1, . . . , l−1} such that gb(xi) = gb(xj ) if d(xi , xj ) ≤ r. Give an upper bound for the probability that ∃b ∈ {0, 1, . . . , l − 1} such that gb(xi) = gb(xj ) if d(xi , xj ) ≥ cr. 5. [7 points] Again, assume we set r appropriately so that there exists some data point x ∈ X with d(x, q) ≤ r. Let ρ = ln(p1) ln(p2) , l = n ρ and k = ln(n) ln(1/p2) . Consider the following two events: (a) For some x ′ ∈ X, d(x ′ , q) ≤ r, ∃b ∈ {0, 1, . . . , l − 1} such that gb(x ′ ) = gb(q). (b) There are at most 4l items in X where each x in those 4l items has d(x, q) ≥ cr and for some b, gb(x) = gb(q). Show the first event happens with probability at least 1 − e −1 (note that limk→∞(1 − 1/k) k = e −1 and (1 − 1/k) k < e−1 ). Show the second event happens with probability at least 3/4 (HINT: use Markov’s inequality https://en.wikipedia.org/wiki/Markov%27s_inequality). Give a lower bound on the probability that both events happen. 6. [2 points] The two events from the previous question happen with a constant probability. We can then boost the probability of success by running multiple independent instances so that the probability that the two events fail for all instances is negligible. For this problem, assume that the previous two events happen with certainty. Recall that we construct l hash functions, and each hash function g consists of k sampled functions from H. Given a query q, we collect all data points in X if they share the same hashed value for any gb with the query point. Now we want to iterate over the collected points and report one point as the nearest neighbor. If we only want to return a point that has distance at most cr to the query point q, how many points do we need to check? Are we guaranteed that there is a point with distance at most cr and why? 5 5 Programming Problem: Linear Regression [30 points] In this problem, you will implement the closed-form solvers of linear regression and ridge regression from scratch (which means that you cannot use built-in linear/ridge regression modules in scikit-learn or any other packages). Then you will try your implementation on a small dataset, the Boston housing price dataset, to predict the house prices in Boston (“MEDV”) based on some related feature attributes. We provide an ipython notebook “linear regression boston.ipynb” for you to complete. In your terminal, please go to the directory where this file is located, and run the command “jupyter notebook”. A local webpage will be automatically opened in your web browser, click the above file to open the notebook. You need to complete the scripts below the “TODOs” (please search for every “TODO”), and submit the completed ipynb file (inside your .zip file). In your write-up, you also need to include the plots and answers to the questions required in this session. The first part of this notebook serves as a quick tutorial of loading dataset, using pandas to get a summary and statistics of the dataset, using seaborn and matplotlib for visualization, and some commonly used functionalities of scikit-learn. You can explore more functionalities of these tools by yourself. You will use these tools in future homeworks. 1. [5 points] Below “1 how does each feature relate to the price” in the ipynb file, we show a 2D scatter plot for each feature, where each point associates with a sample, and the two coordinates are the feature value and the house price of the sample. Please find the top-3 features that are mostly (linearly) related to the house price (“MEDV”). 2. [5 points] Below “2 correlation matrix”, we compute the correlation matrix by pandas, and visualize the matrix by using a heatmap of seaborn. Please find the top-3 features that are mostly (linearly) related to the house price (“MEDV”) according to the correlation matrix. Are they the same as the ones in the previous question (sub-question 1)? 3. [10 points] Below “3 linear regression and ridge regression”, please implement the closed-form solver of linear regression and ridge regression (linear regression with L2 regularization). You can use numpy here. Recap: linear regression solves min w F(w) = ∥Xw − y∥ 2 2 , (8) while ridge regression solves min w F(w) = ∥Xw − y∥ 2 2 + η 2 ∥w∥ 2 2 , (9) where X is an n × d data matrix with n data samples and d features, and y is an n-dim vector storing the prices of the n data samples, and F(w) is the objective function. Run the linear regression and ridge regression on the randomly split training set, and report the obtained coefficients w. For ridge regression, it is recommended to try different η values. 4. [5 points] Below “4 evaluation”, implement prediction function and root mean square error RMSE = vuut 1 n Xn i=1 (yi − yˆi) 2, (10) where ˆyi is the predicted price and yi is the true price of sample i. Apply the implementation and report the RMSE of linear regression and ridge regression on the training set and test set. Compare the training RMSE of linear regression and ridge regression, what do you find? How about the comparison of their test RMSE? Can you explain the difference? 5. [5 points] Below “5 linear models of top-3 features”, train a linear regression model and a ridge regression model by using the top-3 features you achieved in sub-question 2, and then report the RMSE on the training set and test set. Compare the RMSE of using all the 13 features: what is the difference? what does this indicate? 6

[SOLVED] Assignment 2: classification of textual data comp 551

In this assignment, you will implement logistic regression and multiclass regression and evaluate these two algorithms against Decision Trees on two distinct textual datasets. The goal is to gain experience implementing these algorithms from scratch and to get hands-on experience evaluating their performances. 1 Task 1: Data preprocessing Your first task is to turn the text data into tabular format with selected features as the words and the text documents as the training or test examples. We will use two datasets in this project as described below. 1.1 IMDB Reviews The IMDB Reviews data can be downloaded from here: http://ai.stanford.edu/~amaas/ data/sentiment/. To train your model, use only the reviews in the “train” folder. Report the performance of your model on the reviews in the “test” folder. Carefully read the README file to have a clear understanding of the data format. Briefly, imdb.vocab contains the vocabulary with one word per row. The row indices of the words are used to represent the feature indices that appear in the training and test documents in the “labeledBow.feat” files. Task 1.1 The entire vocabulary size is 89526, which is also the total feature size. This is too big for training our custom logistic regression. As a preprocessing step, you will need to decide which features to use. First, you may filter out words that appear in less than 1% of the documents and words that appear in more than 50% of the documents, which are the rare and “stopwords” respectively. Stopwords are the commonly used words that are not important to our tasks. Second, you need to choose the top D ∈ [100, 1000] features by their absolute regression coefficients with the rating scores (1-10) by using the Simple Linear Regression we covered in 2 Module 4.1. In other words, although eventually we will use logistic regression for binary classification on this data, we will perform linear regression (with the rating score as the target variable) in order to find important features. For this step, you must implement the Simple Linear Regression model from scratch (i.e. you cannot use the linear regression model from sklearn). Examine the top features with the most positive simple regression coefficients and the top features with most negative coefficients. Do they make sense for calling a movie good and bad, respectively? 1.2 20 news groups: a multi-class labelled textual dataset The 20-news group dataset can be loaded directly using sklearn.datasets.fetch_20newsgroups (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups. html). Use the default train subset (subset=‘train’, and remove=([‘headers’, ‘footers’, ‘quotes’]) in sklearn.datasets) to train the multiclass prediction models and report the final performance on the test subset. Note: you need to start with the text data and convert text to feature vectors. Please refer to https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data. html for a tutorial on the steps needed for this. Task 1.2 For the sake of this assignment, it is ok to work with a partial dataset with only 5 categories out of the 20 available in the dataset. You may choose your favourite 5 categories. One tip is that choosing 5 distinct categories (e.g., comp.graphics, misc.forsale, rec.sport.baseball, sci.med, talk.politics.guns) may be easy to debug your code because they are easy to distinguish by the corresponding key words. Similar to task 1.1, you can filter out rare words, stopwords, and words that are not relevant to any of the 5 class labels. Since we are dealing with discrete class labels, we will use something different from simple regression to select features externally. For example, you may use Mutual Information (MI) (https://scikit-learn.org/stable/modules/generated/sklearn.metrics. mutual_info_score.html) to select the top M ∈ [10, 100] feature words per class and take the union of all top feature words to train your multiclass model. You may choose other ways to select feature words as long as you report what you did in your report in the end. One thing to keep in mind is that our custom multiclass regression may be slow and without regularization you may want to keep the number of features fairly low. For instance, with 100 feature words per class, we can still have up to 500 features in total for 5 categories. 3 2 Task 2: Implement Logistic and Multiclass classifiers You should follow the equations that are presented in the lecture slides in Module 4.2 and 4.3, and you must implement the models from scratch (i.e., you cannot use scikit-learn or any other pre-existing implementations of these methods). However, you are free to implement these models based on the code provided in the Colab notebook as you see fit. In particular, your two main tasks are to: 1. Implement and evaluate Logistic Regression on the IMDB data 2. Implementing and evaluate the Multiclass Regression on the 5-class prediction from the 20-news-group data You are free to implement these models in any way you want, but you must use Python and you must implement the models from scratch (i.e., you cannot use SciKit Learn or similar libraries). Using the numpy package, however, is allowed and encouraged. Regarding the implementation, we recommend the following approach: • Implement the Logistic Regression and Multiclass Regression models as Python classes. You should use the constructor for the class to initialize the model parameters as attributes, as well as to define other important properties of the model. • Your model class for each algorithm should have (at least) two functions: – Define a fit function, which takes the training data (i.e., X and y)—as well as other hyperparameters (e.g., the learning rate and/or number of gradient descent iterations)—as input. This function should train your model by modifying the model parameters. – Define a predict function, which takes a set of input points (i.e., X) as input and outputs predictions (i.e., yˆ) for these points. • For evaluating binary classification, you should use Receiver Operating Characteristic (ROC) curve and area under the ROC curve (AUROC) to evaluate the model classification as we covered in class. For evaluating multi-class prediction, you should compute classification accuracy. • As a comparison, use Decision Trees from sklearn for both binary and multi-label classification tasks. Task 2.1 Check gradient computed by your implementations using small perturbation, monitor the cross-entropy as a function of iteration, and report your findings on both datasets. 4 Task 3: Run experiments The goal of this project is to have you explore linear classification and examine feature importance by the linear coefficients. Evaluate the performance using AUROC for binary classification and accuracy for multi-class classification. You are welcome to perform any experiments and analyses you see fit, but at a minimum you must complete the following experiments in the order stated below: 1. Report the top 10 features with the most positive coefficients and the top 10 features with the most negative coefficients on the IMDB data using simple linear regression on the movie rating scores. 2. Implement from scratch and conduct (a) Binary classification on the IMDb Reviews (b) Multi-class classification on the 20 news group dataset 3. On the same plot, draw ROC curves and report the AUROC values of logistic regression and Decision Trees on the IMDB data binary classification task 4. Report the multiclass classification accuracy of multiclass linear regression and Decision Trees on the 5 chosen classes from the 20-news-group data 5. Further, with a plot, compare the accuracy of the two models as a function of the size of dataset (by controlling the training size). For example, you can randomly select 20%, 40%, 60% and 80% of the available training data and train your model on this subset and evaluate the trained model on the held-out test set. Note: The above experiments are the minimum requirements that you must complete; however, this project is open-ended. For this part, you might try different learning rates, investigate different stopping criteria for the gradient descent, try linear regression for predicting ratings in the IMDB data, try different text embedding methods as alternatives to bag of words. You are also welcome and encouraged to try any other model covered in the class (including regularized regression such as Ridge and LASSO), and you are free to implement them yourself or use any Python library that has their implementation (e.g. scikit-learn). You are encouraged to explore more classes (greater than 5). In scenarios involving a higher number of classes, it may be challenging to distinguish closely related or similar classes. Rather than focusing solely on the most probable class, consider examining the top k (e.g. k =3) predicted classes. A correct prediction is determined by whether the true label is within the top k predicted labels. The scoring mechanism involves assigning a score of 1 if the correct label is among the top k predictions and 0 otherwise. Of course, you do not need to do all of these things, but look at them as suggestions and try to demonstrate curiosity, creativity, rigour, and an understanding of the course material in how you 5 run your chosen experiments and how you report on them in your write-up. Deliverables You must submit two separate files to MyCourses (using the exact filenames and file types outlined below): 1. assignment2_group-k.ipynb: Your data processing, classification and evaluation code should be all in one single Jupyter Notebook. Your notebook should reproduce all the results in your reports. The TAs may run your notebook to confirm your reported findings. 2. assignment2_group-k.pdf: Your (max 5-page) assignment write-up as a pdf (details below). where k is your group number. Project write-up Your team must submit a project write-up that is a maximum of 8 pages (single-spaced, 11pt font or larger; minimum 0.5 inch margins, an extra page for references/bibliographical content can be used). We highly recommend that students use LaTeX to complete their write-ups. You have some flexibility in how you report your results, but you must adhere to the following structure and minimum requirements: Abstract (100-250 words) Summarize the project task and your most important findings. For example, include sentences like “In this project we investigated the performance of linear classification models on two benchmark datasets”, “We found that the logistic/multiclass regression approach achieved worse/better accuracy than Decision Trees and was significantly faster/slower to train.” Introduction (5+ sentences) Summarize the project task, the two datasests, and your most important findings. This should be similar to the abstract but more detailed. You should include background information and citations to relevant work (e.g., other papers analyzing these datasets). Datasets (5+ sentences) Very briefly describe the datasets and how you processed them. Describe the new features you come up with in detail. Present the exploratory analysis you have done to understand the data, e.g. class distribution. Results (7+ sentences corresponding to 7 figures) Describe the results of all the experiments mentioned in Task 3 (at a minimum) as well as any other interesting results you find. At a minimum you must have these 7 plots: 6 1. A horizontal bar plot showing the top 20 features (10 most positive and 10 most negative) from the Simple linear regression on the IMDB data with the coefficients as the x-axis and the feature names (i.e., words) as the y-axis 2. Convergence plot on how the logistic and multiclass regression converge given a reasonably chosen learning rate. 3. A single plot containing two ROC curves of logistic regression and sklearn-DT (Decision Trees) on the IMDB test data. 4. A bar plot that shows the AUROC of logistic regression and DT on the test data (y-axis) as a function of the 20%, 40%, 60%, 80%, and 100% training data (x-axis) 5. A bar plot that shows the classification accuracies of multiclass regression and DT on the test data (y-axis) as a function of the 20%, 40%, 60%, 80%, and 100% training data (xaxis) 6. A horizontal bar plot showing the top 20 features (10 most positive and 10 most negative) from the logistic regression on the IMDB data with the coefficient as the x-axis and the feature names (i.e., words) as the y-axis 7. A heatmap showing the top 5 most positive features as rows for each class as columns in the multi-class classification on 4 the chosen classes from the 20-news group datasets. Therefore, your heatmap should be a 20-by-4 dimension. Discussion and Conclusion (5+ sentences) Summarize the key takeaways from the project (Do the top features make sense?) and possibly directions for future investigation. Statement of Contributions (1-3 sentences) State the breakdown of the workload across the team members. Evaluation The assignment is out of 100 points, and the evaluation breakdown is as follows: • Completeness (20 points) – Did you submit all the materials? – Did you run all the required experiments? – Did you follow the guidelines for the project write-up? • Correctness (40 points) – Are your models implemented correctly? 7 – Are your reported accuracies close to the reference solutions? – Do your selected top features actually improve performance than randomly chosen features on the IMDB data? – Do you observe the correct trends in the experiments (e.g., comparing training size)? • Writing quality (25 points) – Is your report clear and free of grammatical errors and typos? – Did you go beyond the bare minimum requirements for the write-up (e.g., by including a discussion of related work in the introduction)? – Do you effectively present numerical results (e.g., via tables or figures)? • Originality / creativity (15 points) – Did you go beyond the bare minimum requirements for the experiments? – Note: Simply adding in a random new experiment will not guarantee a high grade on this section! You should be thoughtful and organized in your report. Final remarks You are expected to display initiative, creativity, scientific rigour, critical thinking, and good communication skills. You don’t need to restrict yourself to the requirements listed above – feel free to go beyond, and explore further. You can discuss methods and technical issues with members of other teams, but you cannot share any code or data with other teams.

[SOLVED] Assignment 1: getting started with machine learning comp 551

In this assignment you will implement two classification techniques — K-Nearest Neighbour (KNN) and Decision Trees (DTs) — and compare these two algorithms on two distinct health datasets. The goal is to get started with programming for Machine Learning, how to properly store the data, run the experiments, and compare different methods. You will also gain experience implementing these algorithms from scratch and get hands-on experience comparing performance of different models. 2 Task 1: Acquire, preprocess, and analyze the data Your first task is to acquire the data, analyze it, and clean it (if necessary). We will use two fixed datasets in this assignment, outlined below. • Dataset 1: NHANES age prediction.csv (National Health and Nutrition Health Survey 2013-2014 (NHANES) Age Prediction Subset): https://archive.ics.uci.edu/dataset/887/national+health+and+nutrition+health+survey+ 2013-2014+(nhanes)+age+prediction+subset • Dataset 2: Breast Cancer Wisconsin (Original) dataset: https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original The essential subtasks for this part of the assignment are: 1. Load the datasets into NumPy or Pandas objects in Python. 2. Clean the data. Are there any missing or malformed features? Are there any other data peculiarities that need to be dealt with? You could remove any examples with missing or malformed features and note this in your report. This is a straightforward way to handle this issue by simply eliminating missing values. You can use the X.isna() or X.isnull() functions to check for missing data in the dataset, and use X = X.dropna() to eliminate them. You are welcome to explore other possible ways (e.g. imputating missing values with the average of the observed values for the same features) 3. Compute basic statistics on the data to understand it better. For example, what are the mean of each feature for the positive group and the mean of each feature for the negative group? If you rank the squared difference of the group means, are the top features known to be associated with the target variable? 2 3 Task 2: Implementing KNN and DT You are free to implement these models as you see fit, but you should follow the equations that are presented in the lecture slides, and you must implement the models from scratch (i.e., you CANNOT use SciKit Learn or any other pre-existing implementations of these methods). However, you are free to use relevant code given at the course GitHub https://github.com/ yueliyl/comp551-notebooks. Specifically, your two main sub-tasks in this part are to: 1. Implement KNN with appropriate distance function. 2. Implement DT with appropriate cost function. You are free to implement these models in any way you want, but you must use Python and you must implement the models from scratch (i.e., you cannot use SciKit Learn or similar libraries). Using the NumPy or Pandas package, however, is allowed and encouraged. Regarding the implementation, we recommend the following approach (but again, you are free to do what you want): • Implement both models as Python classes. You should follow the Object Oriented Programming (OOP) paradigm. Use the Constructor for the class to initialize the model parameters as attributes, as well as to define other important properties of the model. • Each of your models classes should have (at least) two functions: – Define a fit function, which takes the training data (i.e., X and y) — as well as other hyperparameters (e.g., the value of K in KNN and maximum tree depth in DT) — as input. This function should train your model by modifying the model parameters. – Define a predict function, which takes a set of input points (i.e., X) as input and outputs predictions (i.e., yˆ) for these points. • In addition to the model classes, you should also define a function evaluate_acc to evaluate the model accuracy. This function should take the true labels (i.e., y), and predicted labels (i.e., yˆ) as input, and it should output the accuracy score. 4 Task 3: Running experiments The goal of this assignment is to have you compare different features and models. Split each dataset into training and test sets. Use test set to estimate performance in all of the experiments after training the model with training set. Evaluate the performance using accuracy and Area Under the Receiver Characteristic Curve (AUROC). For computing the AUROC, you are allowed to use sklearn.metrics.roc_curve function. 3 You are welcome to perform any experiments and analyses you see fit (e.g., to compare different features), but at a minimum you must complete the following experiments in the order stated below and describe your findings for each of them: 1. Compare the accuracy and AUROC of KNN and DT algorithm on the two datasets. 2. Test different K values and see how it affects the training data accuracy and test data accuracy of KNN. 3. Similarly, check how maximum tree depth can affect the performance of DT on the provided datasets. 4. Try out different distance/cost functions for both models. 5. Plot the ROC for KNN and DT on the test data. Both ROC curves need to be plotted on the same plot to enable comparison between of the method performance. Also show the AUROC of each method in the figure legend. You may use evaluation code from the ModelEvaluationAndSelection.ipynb colab (but not the implementation of scikit-learn KNN and DT). 6. Describe how you obtain the key features used in KNN (e.g., external feature selection by correlation with the labels). 7. For DT, you can compute a rough feature importance score for each feature d by counting the number of non-leaf nodes where feature d is used. Report what the top 5 most important features. Are they the same as the simple mean difference approach described in the subtask 3 in Section 2? If not, why? Note: The above experiments are the minimum requirements that you must complete; however, this assignment is open-ended. Here are a few suggestions: • You can split the data into training, validation and testing and use the validation set to select the best K and the best tree depth and evaluate the best choice on the test set. • You may perform K-fold cross-validation. • You may also improve the model performance by implementing weighted KNN as we discussed in class. • For DT, improve the feature importance score for each feature d by a weighted sum based on the reduction of cost (e.g., gini-index at node j): Rd = X J j=1 ∆gj I[vj = d] where ∆gj = (gj − gj (lef t)) + (gj − gj (right)) You do not need to do all of these things, but you should demonstrate creativity, rigour, and an understanding of the course material in how you run your chosen experiments and how you report on them in your write-up. 4 5 Deliverables You must submit two separate files to MyCourses (using the exact filenames and file types outlined below): 1. assignment1_group-k.ipynb: Your data processing, classification and evaluation code should be all in one single Jupyter Notebook. Your notebook should reproduce all the results in your reports. Please ensure that all the original training and output results are saved in your notebook, and these should be the same results provided in your write-up. The TAs may run your notebook to confirm your reported findings. 2. assignment1_group-k.pdf: Your (max 5-page) assignment write-up as a pdf (details below). where k is your group number. 5.1 Assignment write-up Your team must submit a assignment write-up that is a maximum of five pages (single-spaced, 11pt font or larger; minimum 0.5 inch margins, an extra page for references/bibliographical content can be used). We highly recommend that students use LaTeX to complete their write-ups. This first assignment has relatively strict requirements, but as the course progresses your assignment write-ups will become more and more open-ended. You have some flexibility in how you report your results, but you must adhere to the following structure and minimum requirements: Abstract (100-250 words) Summarize the assignment task and your most important findings. For example, include sentences like “In this assignment we investigated the performance of two machine learning models on two benchmark datasets”, “We found that the Decision Tree approach achieved worse/better accuracy than K – Nearest Neighbour.” Introduction (5+ sentences) Summarize the assignment task, the two datasets, and your most important findings. This should be similar to the abstract but more detailed. You should include background information and citations to relevant work (e.g., other papers analyzing these datasets). Methods (4+ sentences) Briefly describe the general algorithmic concepts (not the code) of the machine learning methods you implemented (i.e., KNN and DT) in your own words. Your description can be paraphrased from but not identical to those in the textbooks. Datasets (5+ sentences) Very briefly describe the datasets and how you processed them. Present the exploratory analysis you have done to understand the data, e.g. class distribution. 5 Results (7+ sentences, possibly with figures or tables) Describe the results of all the experiments mentioned in Task 3 (at a minimum) as well as any other interesting results you find (Note: demonstrating figures or tables would be an ideal way to report these results). Discussion and Conclusion (5+ sentences) Summarize the key takeaways from the assignment and possible directions for future investigation. Statement of Contributions (1-3 sentences) State the breakdown of the workload across the team members. 6 Evaluation The assignment is out of 100 points, and the evaluation breakdown is as follows: • Completeness (20 points) – Did you submit all the materials? – Did you run all the required experiments? – Did you follow the guidelines for the assignment write-up? • Correctness (40 points) – Are your models implemented correctly? – Are your reported accuracies close to our solution? – Do you observe the correct trends in the experiments (e.g., how the accuracy and AUROC changes as the K values of KNN or maximum depth of DT increases)? – Do you observe the correct impact of different distance/cost functions on model performance? – Do you find meaningful features with high feature importance scores based on the trained DT? • Writing quality (30 points) – Is your report clear and free of grammatical errors and typos? – Did you go beyond the bare minimum requirements for the write-up (e.g., by including a discussion of related work in the introduction)? – Do you effectively present numerical results (e.g., via tables or figures)? • Originality / creativity (10 points) – Did you go beyond the bare minimum requirements for the experiments? 6 – Note: Simply adding in a random new experiment will not guarantee a high grade on this section! You should be thoughtful and organized in your report. That is, the distinctive ideas that you came up with should blend in your whole story. For instance, explaining the motivations behind them would be a great starting point. 7 Final remarks You are expected to display initiative, creativity, scientific rigour, critical thinking, and good communication skills. You don’t need to restrict yourself to the requirements listed above – feel free to go beyond, and explore further. You can discuss methods and technical issues with members of other teams, but you cannot share any code or data with other teams.

[SOLVED] Csci 677 assignment 6 this is a programming assignment to experiment with adversary attacks on a classification network

This is a programming assignment to experiment with adversary attacks on a classification network and defense by adversarial training, as studied in class. You are asked to attack the Res-9 network that you constructed in HW4 for classifying CINIC-10 data into 10 different categories. We will consider a “white box” attack so the architecture and the parameters of the network are accessible to the attacker. In this assignment, we consider attacks as “untargeted”: the prediction of the modified image is diverted from the original prediction made by the same model. You are asked to implement the Iterative Gradient Sign Method (IGSM) attack. You may adjust the parameters of the attack; the goal is to achieve strong attack success with fewer changes but we are not searching for an optimum. We suggest to start with ϵ = 0.12 and α = 0.006 for IGSM, and the number of iterations to be 20 (note that the suggested numbers assume that the image values range between 0 and 1). In addition to attack, we ask you to implement a simple adversarial training pipeline, which is similar to the training process that you have implemented in HW4. The key difference is that you replace some of the original training images, in a minibatch, with attack images generated by the designated attack. The replacement ratio is a hyper-parameter ranging between 0 and 1. By default you can set it at 0.75, meaning that 75% of training images are replaced with attack images. You can adjust it for variations in the experiments. For adversarial training, we recommend you to fine-tune your model for 10 epochs. Implementation Hints: Following provides some hints for implementation. You are not required to follow these. You may also find related code online which you can use for guidance. 1) To find the gradients of the outputs y with respect to the inputs x. 1 loss = loss_fn ( outputs , y= y) 2 loss . backward () 3 x_grad = x. grad . detach () 2) To get only the sign you can use: 1 x_grad_sign = x_grad . sign () 3) To make inputs containing the gradients, you would need to make them leaf variables and have requires grad as True. You can do: 1 inp = inp_tensor . detach () . clone () 2 inp = inp . requires_grad_ ( True ) 4) For adversarial training, we provide the pseudo code for your reference: 1 # Load your model weights here 2 for epoch in range ( nb_epochs ) : 3 for batch_id in range ( nb_batches ): 4 x_batch , y_batch = generator . get_batch () 5 nb_adv = int ( np . ceil ( self . ratio * x_batch . shape [0]) ) 6 7 # Sample image ids to be replaced 8 # Generate attack image and replace original images 9 # Forward pass and Backward pass as in hw4 Note that the .grad would work only for the leaf variables, that is only for the input. As such, this should be enough for the assignment. However, in case you feel you need to look at intermediate gradients, you could use the hook function as follows: 1 def save_grad_hook ( module , grad_input , grad_output ): 2 setattr ( module , ‘grad_back ‘, grad_output ) 3 layer_xyz . register_backward_hook ( save_grad ) Experiments For experiments, we ask you to do the following. • Evaluate the model you trained in HW4 (you can retrain one if you did not save the model), with benign images and IGSM attacked images; we have given suggested attack parameters but you may adjust them to get a strong attack while keeping the manipulation magnitude low. We are not looking for you to find an optimal but please results for two sets of parameters. • Second, starting from your pre-trained weights, fine-tune your model through the adversarial training pipeline and report accuracy again on benign images and IGSM attacked images. Note that you need to generate attacks on the adversarially trained model again instead of reusing the originally computed attack images. For adversarial training, take IGSM as your attack function and test your model with its attacks. Submission Please include your code and the following in your submission document. Quantitative Results: • Report accuracy of your model without defense on benign images and IGSM attack images. • Report accuracy of your model after adversarial training on benign images and IGSM attack images. Qualitative Results • Visualize IGSM attack images of your no-defense model (2-5 samples). • Visualize IGSM attack images of your adversarial-trained model (2-5 samples)

[SOLVED] Homework assignment #5 csci677 this is a programming assignment on fine-tuning two object detection models.

This is a programming assignment on fine-tuning two object detection models. We have essentially provided the needed code and pointers to helpful tutorial material. The following describes the basic library to use, the dataset to adapt to, the evaluation code and the experiments you should conduct. a) Library: We ask you to experiment with fine tuning of two different object detection models. One is a Faster RCNN with FPN; the other is a DETR model. Faster RCNN: We will use Detectron2 software (https://github.com/facebookresearch/detectron2). It is high-level wrapped code for object detection powered by PyTorch. Though Detectron2 supports multiple computer vision tasks, including instance segmentation, we will only use its object detection functionality in this assignment. To install Detectron2, use the following code in Colab: !pip install pyyaml==5.1 !pip install detectron2 -f pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/ind ex.html NOTE: You will need to restart your Colab runtime once to finish installation. The task is to fine-tune an objection detection model. The pretrained model can be found in Detectron2 model zoo https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md You are asked to experiment with the Faster RCNN “R50-FPN, 3x” model, ID# 137849458. Detectron2 provides a config dictionary, so that you can easily setup your experiment. Following is an example, you may need to substitute filenames and parameters as needed. cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file(“COCODetection/faster_rcnn_R_50_FPN_3x.yaml”)) cfg.OUTPUT_DIR = ‘MyVOCTraining’ cfg.DATASETS.TRAIN = (“voc_2012_train”,) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 1 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(“COCODetection/faster_rcnn_R_50_FPN_3x.yaml”) # Let training initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 1 cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR cfg.SOLVER.MAX_ITER = 3000 cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 cfg.MODEL.ROI_HEADS.NUM_CLASSES = 20 You can choose a model with cfg.merge_from_file() and set initial weights with cfg.MODEL.WEIGHTS. You can choose training/testing dataset with cfg.DATASETS.TRAIN/TEST. You can also find another example in Detectron2’s official tutorial at https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=7unkuuiqLdqd . DETR Model: We will use the official code for DETR at: https://github.com/facebookresearch/detr. To setup for DETR model in colab, run: !git clone https://github.com/facebookresearch/detr.git You can use the following command to finetune DETR on a COCO 2017 format dataset: ! python main.py –batch_size 2 –resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth — coco_path –output_dir – -lr –lr_backbone –epochs 2 You need to pick a good LR for finetuning. A good idea is to pick a LR smaller than the default LR used for training from scratch. You will find the default LR in main.py. You can also finetune the model for more epochs. To avoid evaluating on the entire dataset during training, please comment the following lines in main.py : # line 214-217 test_stats, coco_evaluator = evaluate( model, criterion, postprocessors, data_loader_val, base_ds, device, args.output_dir ) # and line 223-236 if args.output_dir and utils.is_main_process(): with (output_dir / “log.txt”).open(“a”) as f: f.write(json.dumps(log_stats) + “ ”) # for evaluation logs if coco_evaluator is not None: (output_dir / ‘eval’).mkdir(exist_ok=True) if “bbox” in coco_evaluator.coco_eval: filenames = [‘latest.pth’] if epoch % 50 == 0: filenames.append(f'{epoch:03}.pth’) for name in filenames: torch.save(coco_evaluator.coco_eval[“bbox”].eval, output_dir / “eval” / name) To visualize the detection results, you can check: https://colab.research.google.com/github/facebookresearch/detr/blob/colab/notebooks/detr_ attention.ipynb b) Dataset: We will use the Pascal VOC dataset for object detection (pretrained models ares trained on MSCOCO dataset). Even though the dataset is the same for the two detectors, they use different formats so you have to process and load differently. For the Detectron model, you will need to download the dataset to Colab but you don’t need to prepare the dataset by yourself. To download the dataset, you can use following code in Colab !wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11- May-2012.tar !tar -xvf VOCtrainval_11-May-2012.tar Detectron2 uses “datasets” as the default directory for data. Change folder name to “datasets/VOC2012” by using !mv VOCdevkit datasets For the DETR model, dataset setup is more complex. DETR only supports COCO 2017 dataset format, which follows the file structure of: – path_to_coco/ -annotations/ — instances_train2017.json — instances_val2017.json -train2017/ images for training -val2017/ While the VOC dataset format is: – VOCdevkit/ – VOC2012 -Annotations/ annotation xml files for each image -ImageSets/ -JPEGImages/ -SegmentationClass -SegmentationObject To convert the VOC dataset to COCO format, you need to convert the xml annotation files in VOC into a single json file. There are several important details regarding the data format conversion as well. We provide a notebook for the data format conversion at https://colab.research.google.com/drive/10IPTFxi0eu41xI3xS_iZjjkWVRnuBgtA?usp=shar ing which includes all the details. c) Evaluation: Detectron2 provides a high-level evaluator for Pascal VOC, you can use it simply as: from detectron2.evaluation import PascalVOCDetectionEvaluator. Evaluator returns Average Precision based on IoU (intersection-of-union); you only need to report AP50 which is the primary metric for Pascal VOC. To evaluate DETR, you can simply run: ! python main.py –batch_size 2 –resume –eval –no_aux_loss –coco_path ../VOC_coco_format/ d) Your work: 1. Build a pipeline for fine-tuning object detection on Pascal VOC. You can use the code given above and refer to available tutorials if helpful. 2. Fine tune the network and show qualitative and quantitative results of the model you have trained. For qualitative results, show the object detection results for some images; you can use Detectron2 high level API in detectron2.utils.visualizer. 3. Show the evolution of loss functions: Detectron2 saves Tensorboard record by default. To use Tensorboard file, you can do either 1) Download record and run Tensorboard on your local machine or 2) use Colab(jupyter) integrated Tensorboard to read curves. %load_ext tensorboard %tensorboard –logdir PATH_TO_YOUR_RECORD_FOLDER 4. Repeat step 2 for DETR model. It’s optional to repeat step 3 for DETR model. To enable Tensorboard for DETR, you need to change the main.py file from the official repo. SUBMISSION: Your submission file should contain the following: 1. The code you have written (in report PDF, .py file or .ipynb file, first is preferred for grading), including brief descriptions/comments of each function/training block. 2. Show qualitative and quantitative results of the two models and provide a comparison between the two. Include training curves for the two methods. Also compare the performance of the models before and after fine tuning. 3. As the models may be slow to train, number of training epochs may be limited; however, you are still encouraged to try some variations if possible.

[SOLVED] Csci 677 assignment 4 this is a programming assignment to create, train and test a cnn for the task of image classification.

This is a programming assignment to create, train and test a CNN for the task of image classification. To keep the task manageable, we will use a small dataset and two small networks.You are asked to construct and experiment with two, relatively small, CNNs. First is a LeNet-5 network; the second is ResNet-9. Details of both are given below. The figures include filter sizes and number of channels for both. Please define the network architecture like: 1 class LeNet5 ( nn . Module ): 2 def __init__ ( self , num_classes =10) : 3 super ( LeNet5 , self ). __init__ () 4 # TODO 5 def forward ( self , x): 6 # TODOA sketch of LeNet-5 is shown in Figure 1; the network is a bit different than described in the original, 1998, paper. All convolutions are across all channels of the layer a filter is applied to. All activatiion units should be ReLU. Note that the architecture does not include a softmax layer but you may add so if you prefer. Figure 1: LeNet-5 Architecture ResNet-9 A sketch of ResNet-9 is shown in Figure 2. Figure 2: ResNet-9 Architecture Framework PyTorch is the required framework to use for this assignment. You are asked to define your model layer-bylayer by using available functions from PyTorch. You are free to use PyTorch document or other sources to help but do not import pre-defined LeNet or ResNet definitiions. Dataset We will use the CINIC-10 dataset. It consists of 10 mutually exclusive classes with a total of 270,000 images equally split amonst three subsets: train, validate, and test. Each image is a 32 × 32 RGB image. You can download the dataset from https://drive.google.com/file/d/1ZEWAU7k0lTEmabjEmtrd9SGsmNGHEVIC/ view?usp=sharing. Afte extracting the downloaed file, it should contains 3 data folders: test, train, and valid, each contains 10 subfolders indicating 10 classes. Each subfolder consists of 9,000 images. To load CINIC data, you may refer to the demo code provided below: 1 from torchvision import datasets , transforms 2 from torch . utils . data import DataLoader 3 4 class CINIC10Dataset : 5 def __init__ ( self , batch_size =64 , root = ‘.’): 6 self . transform = transforms . Compose ([ 7 transforms . ToTensor () , 8 transforms . Normalize (( , , .) , ( , , .) ) 9 ]) 10 self . batch_size = batch_size 11 12 self . train_dataset = datasets . ImageFolder ( root =f ‘{ root }/ train ‘, 13 transform = self . transform ) 14 self . train_dataoader = DataLoader ( self . train_dataset , batch_size = batch_size , 15 shuffle = True , num_workers =2) 16 17 self . valid_dataset = datasets . ImageFolder ( root =f ‘{ root }/ valid ‘, 18 transform = self . transform ) 19 self . valid_dataoader = DataLoader ( self . valid_dataset , batch_size = batch_size , 20 shuffle = False , num_workers =2) 21 22 self . test_dataset = datasets . ImageFolder ( root = f ‘{ root }/ test ‘, 23 transform = self . transform ) 24 self . test_dataoader = DataLoader ( self . test_dataset , batch_size = batch_size , 25 shuffle = False , num_workers =2) Note: You should normalize the images to zero mean and unit variance for pre-processing. First normalize your image pixel values to (0,1) range, calculate the dataset mean/std values, and then normalize the images to be zero mean and unit variance. Training Train the networks using the given training data for 20 epochs. For the main experiment setting, we suggest starting with a mini-batch size of 128, ADAM optimizer with learning rate α = 0.001, β1 = 0.9 and β2 = 0.999. You are free to experiment with other learning rates and other optimizers. Please use cross entropy loss for your main experiment. Record the error after each step (i.e. after each batch) so you can monitor it and plot it to show results. During training, you should test on the validation set at some regular intervals; say every 5 epochs, to check whether the model is overfitting. Note: To plot the loss function or accuracy, you can use pylab, matplotlib or tensorboard to show the curve. Test Results Test the trained network on the test data to obtain classification results and show the results in the form of confusion matrix and classification accuracies for each class. For the confusion matrix you could either write the code on your own, or use scikit-learn to to compute the confusion matrix. (See: https://scikit-lea rn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html for more details). Variations The above defined LeNet-5 network does not have normalization or regularization implemented. Similar to the main experiment, conduct additional experiments using batch normalization and L-2 regularization of the trainable weights (independently). Also compare the LeNet-5 and ResNet results. We have also not asked for use of augmentations; you are encouraged to experiment with them but this is not required as our dataset is relatively simple. SUBMISSION For your submission, include 1) your source code and 2) a report. Please follow the following instructions for preparing your submission. 1. For your main experiment setting, show the evolution of training losses and validation accuracy with multiple steps (training log + curves) for the two networks 2. Show the confusion matrix and per-class classification accuracy for this setting. 3. Show some examples of failed cases, with some analysis if feasible. 4. Compare your results for the two networks and any variations you may have tried. For the source code, we encourage you to submit the code of the main experiment setting with the variation of the settings mentioned above. For your report, you should include the results of both main experiment settings and those with different experiment variations. Hints Following are some general hints on structuring your code, it is not required to follow this template. 1. You need to create • Dataset and dataloader Dataset class is to do preprocessing for the raw data and return the specific example with the given ID. It can coorperate with Dataloader for selecting the examples in the batches. Please check https://pytorch.org/docs/stable/data.html# for details. • Loss Function calculates the loss, given the outputs of the model and the ground-truth labels of the data. • Model This is the main model. Data points would pass through the model in the forward pass. In the backward pass, using the backpropagation algorithm, gradients are stored. Please write your own LeNet-5 model instead using the pre-built one in torchvision. • Optimizer These are several optimization schemes: Standard ones are available in Pytorch; we suggest use of ADAM, though you could also try using the plain SGD. You would be calling these to update the model (after the gradients have been stored). • Evaluation Compute the predictions of the model and compare with the ground-truth labels of your data. In this homework, we are interested in the top-1 prediction accuracy. A demo code snippet for evaluation is provided below for reference. 1 import torch 2 3 def compute_accuracy ( model , dataloader , device = ‘cuda ‘): 4 model . eval () # Set the model to evaluation mode 5 correct_predictions = 0 6 total_predictions = 0 7 8 with torch . no_grad () : 9 for images , labels in dataloader : 10 images , labels = images . to ( device ) , labels . to ( device ) 11 12 outputs = model ( images ) 13 14 _ , predicted_labels = torch . max( outputs , 1) 15 16 correct_predictions += ( predicted_labels == labels ). sum () . item () 17 total_predictions += labels . size (0) 18 19 # Calculate accuracy 20 accuracy = correct_predictions / total_predictions * 100 21 return accuracy 22 • Training Loops This is the main function that will be called after the rest are initialized. 2. There is an official PyTorch tutorial online, please refer to this tutorial for constructing the above parts: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html 3. Apart from the above, it is suggested to log your results. You could use TensorBoard (compatible with both PyTorch/TensorFlow) or simply use ‘logger’ module to keep track of loss values, and metrics. 4. Note on creating batches: It is highly recommended to use the DataLoader class of PyTorch which uses multiprocessing to speed up the mini-batch creation.

[SOLVED] Csci 677 assignment 3this is a programming assignment consisting of two parts: first is to apply colmap software

This is a programming assignment consisting of two parts: First is to apply COLMAP software for inferring structure from motion (and camera poses) from a given set of images and the second is to use the SFM output to train a Gaussian Splatting (GS) model for rendering novel views. Details of the two parts are given below. In both cases, the code is essentially made available so your effort is mostly in evaluating the results based on some variations and in describing the results. The GS code requires use of a GPU, if you do not have a computer with a suitable GPU, you may use Google Colab (instructions will be provided in class). Please do not use language tools such as ChatGPT to write the code or the report for you.In class, we studied COLMAP as one of the available SFM software packages that can take a set of imagess from cameras of unknown poses and infer these camera poses and also reconstruct 3D positions of a sparse set of points. Code for COLMAP is available at https://github.com/colmap/colmap. Code is available as binaries for Windows, Linux and MacOS environments or can be built from the available source files; use of binaries is acceptable for the assignment. The github page also provides good documentation for using the software package. We provide a set of 20 images from the ”South Building” dataset here. We ask you to process the data in two sets; one is to use just the first 10 images, the other is to use all 20. The software will output files containing camera poses and reconstructed 3D points. We suggest that for camera model, you choose “SIMPLE PINHOLE”. For matching, try “exhaustive” and “sequential” options. Visualize the results using available commands and evaluate qualitatively. The software offers several visualization tools such as showing the detected feature points, matched points and camera locations. Use your own steps to best show the quality of results. Besides qualitative evaluation, also consider the reprojection error (which is the quantity that the method opimizes); you can get this error from the GUI by following “Extras, Show model statistics”. Compare the results using the two given matching modes and also with the two different sized datasets. You may find it easier to your own computer for this part of the assignment though you may also use Colab if you prefer.In this part, we ask you to learn a GS model for the images used in the SFM part above, using the camera poses and sparse reconstruction points and to use the learned model to infer images from other viewpoints. You may use the matching option that gives the best SFM results. You can find the code and other useful information of the original GS implementation at https://github.com/graphdeco-inria/gaussian-splatting. The code requires a GPU to run efficiently. We do not expect students to have a GPU-enabled personal computer so recommend using Colab. Fortunately, https://github.com/camenduru/ gaussian-splatting-colab provides us with directly usable code for Colab. Once a GS model is trained, it can be used to render the scene from arbitrary viewpoints and we encourage you to do so and evaluate visually. For more precise evaluation, we can use the images in our dataset by separating them in training and test sets. Please use the following settings. • Use the SFM models the first 10 images but exclude “image #5” from training set for GS model. Use camera pose for “image #5” to render using the GS model and compare with the actual image.You should also compute quantitative metrics; the software provides an option to compute these. Note that the test is a little bit imprecise because “image #5” will have been used in COLMAP to add to the set of sparse 3D points used to initialize. • Repeat the above experiment but using 20 images this time with images #s 5, 10 and 15 used as test images (and excluded from the training set). Observe how the results change with more training images, if at all. • Repeat both of above experiments with different number of iterations and note the dependence, if any, of the quality of rendering on this number. The software package also offers options to change several other parameters. You may experiment with these for better understanding but are not required for the assignment and will not be given extra credit. What to Submit As the code for this assignment is mostly provided, you should focus on displaying and analyzing results (both qualitatively and quantitatively where possible). Hints of what to include are in the descriptions above. You should submit a single PDF file including your source code and report.

[SOLVED] Csci 677 assignment 2 1 panorama stitching this is a programming assignment. you are asked to write a program that stitches

This is a programming assignment. You are asked to write a program that stitches multiple images into a panorama by computing homography between consecutive images, All required functions are available in OpenCV version >= 4.4.0. Please follow the instructions below to complete this process. You may find several online tutorials for this task; you are free to consult those but please follow the steps we have outlined below. Please do not use language tools such as ChatGPT to write code for you.1.1 Data Preparation We provided an example set of images here, but we encourage you to also take photos with your own camera or smartphone. Take two or more photos of the same scene, ensuring that there is enough overlap between consecutive photos for matching features and the viewpoint change is approximately that of rotation only.1.2 Feature Detection Load the images using cv.imread(). Convert them to grayscale images. Create a SIFT feature detector. Detect the keypoints on both images and display them with size and orientation using cv.drawKeypoints(). Here’s an example of how you can use these functions: img = cv.imread(’home.jpg’) gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY) sift = cv.SIFT_create() kpts, dpts = sift.detectAndCompute(gray, None) You can also follow the tutorial here.1.3 Feature Matching Create a brute force matcher with cv.BFMatcher(). Use bf.knnMatch to find matches among the descriptors you just detected on the two images. This function returns the top-k matches by filtering out weak matches according to the ratio between the best and the second-best matches. Set k=2 for the ratio test to filter matches, but you can experiment with other k values to achieve the best matching results. Display resulting matches between the two images using cv.drawMatchesKnn(). Here’s an example code snippet: bf = cv2.BFMatcher() matches = bf.knnMatch(src_dpts, dst_dpts, k=2) You can also follow the tutorial here. 1.4 Compute Homography Using the matched features, compute the homography matrix for each pair of consecutive images with RANSAC. Print out the homography matrix. You can use cv.findHomography() for this. Example usage: H, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 5.0) You can follow the tutorial here. 1.5 Stich into a Panorama Before stitching the images to compose a panorama, you need to determine the size of the final stitched image. Since the panorama is larger than each individual image, we need to define a rectangle that covers all warped images. We provide the code snippet below for reference. In this code, cv.perspectiveTransform() is used to apply a perspective transformation to a set of points, allowing us to calculate the minimum and maximum coordinates (min x, min y, max x, max y) to define the size of the output stitched image. min_x = min_y = max_x = max_y = 0.0 for i in range(count): # Get the height and width of the original images h, w, p = images[i].shape # Create a list of points to represent the corners of the # images corners = np.array([[0, 0], [w, 0], [w, h], [0, h]], dtype=np.float32) # Calculate the transformed corners transformed_corners = cv.perspectiveTransform( corners.reshape(-1, 1, 2), cumulative_homography[i] ) # Find the minimum and maximum coordinates to determine the # output size min_x = min(transformed_corners[:, 0, 0].min(), min_x) min_y = min(transformed_corners[:, 0, 1].min(), min_y) max_x = max(transformed_corners[:, 0, 0].max(), max_x) max_y = max(transformed_corners[:, 0, 1].max(), max_y) # Calculate the width and height of the stitched image 2 output_width = int(max_x – min_x) output_height = int(max_y – min_y) # define an offset for translation due to negative coordinates offset_matrix = np.array([[1, 0, – min_x], [0, 1, – min_y], [0, 0, 1]], dtype=np.float32) # define the output image output = np.zeros((output_height, output_width, 3), dtype=np.uint8)Now, you can proceed to stitch the images. First, select an image as an anchor and transform other images onto this anchor image. The transformation between any image and this anchor image is the composition of a series of homographies. Compute the transformations and map all other images onto the anchor image. You can use the cv.warpPerspective() function to warp individual images img to the anchor image perspective using a homography matrix H. Example usage: warped_img = cv.warpPerspective(img, H, (width, height)) Note that, by default, this function does bilinear interpolation to calculate the pixel color values for the warped pixels (there is an option to use nearest neighbor instead but we recommend using the default).As you warp each image, place it onto the panorama canvas (i.e., the empty ”output” we defined before). The basic logic is to layer the images such that latter images are placed on top of former ones. This means that as you add each warped image to the panorama, it will overlap and blend with the previous images. After you obtain the panorama, display it along with each transformed image.What to Submit As this assignment is mostly implemented using built-in functions, you are expected to display some intermediate results to show the internal workflow of the program. A sample visualization is shown here. 1. SIFT features: show the detected features overlaid on the images. Also give out the number of features detected in each image. 2. Graphically show the top-10 scoring matches found by the matcher before the RANSAC operation. Provide statistics of how many matches are found for each image pair. 3. Show total number of inlier matches after homography estimations. Also show top10 matches that have the minimum error between the projected source keypoint and the destination keypoint. (Hint: check the mask value returned by the function estimating the homography). 4. Output the computed homography matrix. (The built-in homography finder also applies a non-linear optimization step at the end; you can ignore or disable this step if you wish.) 3 5. Show the final panorama along with each transformed image. You should submit a single PDF file including your source code (should be wellcommented) and report. The report should include: 1. A brief description of the programs you implemented. 2. Show the results of intermediate steps as listed above. 3. Write down your observations regarding the results you obtained throughout this process.

[SOLVED] Csci104 assignment 4

Spring 2025Tree Traversals Write a recursive (no loops allowed) routine to determine if ALL paths from leaves to root are the same length. The tree nodes are instances of the following struct: struct Node { int key; Node *left, *right; }; However, for this application the key (or integer) in the node is not utilized and can be ignored. The function is prototyped in equal-paths.h and should be implemented in the corresponding .cpp file. // Prototype bool equalPaths(Node * root); Examples See the images below of trees with equal paths that should return true and trees that do not have all equal paths which should return false. ● You MAY define helper functions if needed in equal-paths.cpp. ● You CANNOT use a container (like a set, vector, map) to do your work. Do your work during your traversal (this is the learning goal of the problem). ● We have also provided a VERY RUDIMENTARY test program equal-paths-test.cpp that you may use and modify as desired to test certain configurations and debug your code. It will not be graded. Test your Code Run equal-paths-test.cpp for basic testing Open a new terminal: cd hw4 make equal-paths-test ./equal-paths-test Run more in-depth tests: cd hw4_tests/equalpaths-test make equalpaths_tests ./equalpaths_tests Prepare your Code for Submission Open a new terminal and cd hw4, then: Ensure you suppressed all debugging messages: grep -n “cout” equal-paths.h grep -n “cout” equal-paths.cpp Ensure you placed all includes in the proper place (for example): #ifndef RECCHECK #include #endif Make sure code compiles without warning: make equal-paths-test Ensure no Valgrind errors: valgrind –tool=memcheck –leak-check=yes ./equal-paths-test Add, commit and push equal-paths.cpp and equal-paths.h and any other files with changes (i.e git status should be clean) NextBSTs and Iterators Important Perspective: Remember that BSTs are an implementation of the set and map ADT. While a set only has keys, a map has a value associated with each key. But in any map or set, it is the key that is used to organize and lookup data in the structure and thus must be unique. In this homework you will implement a binary search trees and then extend it to build an AVL tree. We are providing for you a half-finished file bst.h (in the resources repository) which implements a simple binary search tree. We are also providing a complete print_bst.h file that allows you to visually see your tree, for help in debugging. HOWEVER to use this print function you must have a working iterator implementation. If the tree doesn’t print correctly you need to verify your iterator works and also that your insertions/removals haven’t destroyed parts of the tree. This file is already #include‘d into your bst.h and is invoked by simply calling the public print() member function on your tree (e.g. if you are in main() and have a BST object named b, then just call b.print();). You will need to complete the implementation for all seven functions that have TODO next to their declaration in bst.h. We provide additional clarifications for the following functions, where n is the number of nodes in the tree, and h is the height of the tree: 1. void insert(const std::pair& keyValuePair) : This function will insert a new node into the tree with the specified key and value. There is no guarantee the tree is balanced before or after the insertion. If key is already in the tree, you should overwrite the current value with the updated value. Runtime is O(h). 2. void remove(const Key& key) : This function will remove the node with the specified key from the tree. There is no guarantee the tree is balanced before or after the removal. If the key is not already in the tree, this function will do nothing. If the node to be removed has two children, swap with its predecessor (not its successor) in the BST removal algorithm. If the node to be removed has exactly one child, you can promote the child. You may NOT just swap key,value pairs. You must swap the actual nodes by changing pointers, but we have given you a helper function to do this in the BST class: swapNode(). Runtime of removal should be O(h). 3. void clear() : Deletes all nodes inside the tree, resetting it to the empty tree. Runtime is O(n). 4. Node* internalFind(const Key& key) : Returns a pointer to the node with the specified key. Runtime is O(h). 5. Node* getSmallestNode() : Returns a pointer to the node with the smallest key. This function is used by the iterator. Runtime is O(h). 6. bool isBalanced() const : Returns true if the BST is an AVL Tree (that is, for every node, the height of its left subtree is within 1 of the height of its right subtree). It is okay if your algorithm is not particularly efficient, as long as it is not O(n^2). This function may help you debug your AVL Tree in the next part of this problem, but it is mainly given as practice of writing recursive tree traversal algorithms. Think about how a pre- or post-order traversal can help. 7. Constructor and destructor : Your destructor will probably just call the clear function. The constructor should take constant time. 8. You will need to implement the unfinished functions of the iterator class. Note: You do NOT need to check whether the iterator is about to dereference a NULL pointer in operator*() or operator->() of the iterator. Just let it fault. It is up to the user to ensure the iterator is not equal to the end() iterator. Notes: ● Remember a BST (as well as any map implementation) should always be organized via the key of the key/value pair. ● The iterator class you write is mainly for clients to call and use to access the key,value pairs and traverse the tree. You should not use it as a helper to traverse the tree intenally. Instead use Node* or AVLNode* pointers directly along with internalFind, successor, predecessor etc. ● In this class we make use of static member functions. You can search online for what this means but here is a brief summary. A static member function cannot be called upon an object ( bst.static_member_func() ) and DOES NOT have a this pointer. Thus in that function you cannot try to access this->root_. Essentially, it is like a global level function shared by all instances of BSTs. It is a member of the class so if someone passes in an actual BST it CAN access private data (i.e. BST’s root_ member). We have made predecessor() a static member function and we suggest you make a static successor() function. That will be useful so that the iterator class can just call successor() when we want to increment the iterator. We use static for successor because the iterator class doesn’t have a BST pointer/reference so it couldn’t call successor if it was a normal member function. ● Very Important Warning: Please do not remove, modify, or rename any of the members (public, protected, OR private) in bst.h. You may add protected or private helper functions. protected helper functions may be useful if the AVLTree class you will code in the latter problem will also benefit from that function. If you do not heed this warning, our tests won’t work, and you’ll lose points. ● Reminder: If the tree doesn’t print correctly in your test program(s), you need to verify your iterator works (e.g. successor()) and also that your insertions/removals haven’t destroyed parts of the tree. Related Videos A video walkthrough is available and demonstrates techniques that can be used to debug either your BST or AVL tree. Test your Code Open a new terminal Run bst-test.cpp for basic testing: cd hw4 make bst-test ./bst-test Run more in-depth tests: cd hw4_tests/bst_tests make bst_tests ./bst_tests Prepare your Code for Submission Open a new terminal and cd hw4, then: Ensure you suppressed all debugging messages: grep -n “cout” bst.h Make sure code compiles without warning: make bst-test Ensure no Valgrind errors: valgrind –tool=memcheck –leak-check=yes ./bst-test Add, commit and push bst.h and any other files with changes (i.e git status should be clean) NextAVL Trees We are providing you a half-finished file avlbst.h (in the homework-resources repository) for implementing an AVL Tree. It builds on the file you completed for the previous question. Complete this file by implementing the insert() and remove() functions for AVL Trees. You are strongly encouraged to use private/protected helper functions. When you compile code that includes avlbst.h you will need to add the option –std=c++11 to the g++ compilation command. This is because of the usage of the override keyword for various virtual functions. You can read about it online but it mainly provides some additional compiler checks to ensure the signatures of a virtual function in the derived class matches the one you are attempting to “override” in the base class (i.e. if your base virtual function is a const-member but you forget to add const to the derived and thus are creating a whole new member function, the compiler will catch the error). Related Videos A video walkthrough is available and demonstrates techniques that can be used to debug either your BST or AVL tree. Test your Code Open a new terminal and cd hw4, then: Run more in-depth tests: cd hw4_tests/avl_tests make avl_tests ./avl_tests Prepare your Code for Submission Ensure you suppressed all debugging messages grep -n “cout” avlbst.h Make sure code compiles without warning make bst-test Ensure no Valgrind errors valgrind –tool=memcheck –leak-check=yes ./bst-test Add, commit and push avlbst.h and any other files that have changes (i.e git status should be clean) Next

[SOLVED] Csci104 assignment 3

Spring 20252. Testing We have provided a test suite for your llrec functions, your heap and your stack. They work similar to the previous assignment: cd hw3_tests cmake . make You only need to run cmake once, after that, if you make changes to your code you only need to run make. Change directory to llrec_tests, stack_tests, or heap_tests to access the individual test programs. The test directory is excluded by the .gitignore. If you need to download the tests (Linux/Docker only) elsewhere: wget https://bytes.usc.edu/files/cs104/sp23/grading/hw3_tests.tar.gz tar xvfz hw3_tests.tar.gz Run these commands inside your hw3 directory. Next Current layout: 1 Panel with tree 3. Linked List Split/Pivot In this problem you will practice implementing recursive functions that process linked lists. Skeleton code is provided in the hw3 folder in this assignment. There are two separate recursive functions that we will ask you to write. They are unrelated to each other (just part a, and part b of this problem), but each of the two problems below will use the following Node definition. struct Node { int val; Node *next; }; You may declare and use helper functions as you deem necessary. Remember to handle the cases when an input linked list is empty. Also, most recursive solutions are elegant. If you find yourself writing a lot of code, you likely aren’t on the right track. If you had a recursive linked list tracing problem in a prior homework, that might give you an idea of the elegance we are referring too. Part 1 – Linked List Split/Pivot Write a recursive function to split the elements of a singly-linked list into two output lists, one containing the elements less than or equal to a given number, the other containing the elements larger than the number. You must MAINTAIN the relative ordering of items from the original list when you split them into the two output lists. The original list should not be preserved. Your function must be recursive – you will get NO credit for an iterative solution. It must also run in O(n), where n is the length of the input list (and can be done with only one pass/traversal through the list). Here is the function you should implement: void llpivot (Node*& head, Node*& smaller, Node*& larger, int pivot); When this function terminates, the following holds: ● smaller is the pointer to the head of a new singly linked list containing all elements of head that were less than or equal to the pivot. ● larger is the pointer to the head of a new singly linked list containing all elements of head that were (strictly) larger than the pivot. ● the linked list head no longer exists (head should be set to NULL). Note: smaller and larger may be garbage when called (i.e. you can NOT assume they are NULL upon entry). Also you should not delete or new nodes, but just change the pointers to form the two other lists. As an example, suppose the list pointed to by head contained 2 4 8 3. If we used 5 as the pivot and called: llpivot(head, smaller, larger, 5); Then: ● head should be an empty list ● smaller should contain 2 4 3 ● larger should contain 8 See llrec.h for more details and description and then place your implementation in llrec.cpp. Next Current layout: 1 Panel with tree 4. Linked List Filter Part 2 – Linked List Filter Write a recursive function to filter/remove elements of a singly-linked list that meet a specific criteria. The criteria for removal is provided by a comparison (Comp) functor/function object that provides an operator() that takes in an int and returns a bool if the node should be removed/filtered. Filtered nodes should be deallocated. Your function must be recursive – you will get NO credit for an iterative solution. It must also run in O(n), where n is the length of the input list (and can be done with only one pass/traversal through the list). template Node* llfilter(Node* head, Comp pred); As an example, if the list pointed to by head contained: 3 6 4 9 and the Comp object’s operator() returns true for an ODD integer input, then the function should return a pointer to the list containing 6 4 (since all the odd integers would have been filtered out). Since this is a templated function (to allow for different function object types), you should put your implementation in llrec.h. See llrec.h for more details and description. Testing We have provided a skeleton file llrec-test.cpp with a main() and some helper functions that read in values from a file to create a linked list, print a linked list, and deallocate a linked list. Complete llrec-test.cpp to exercise your functions and verify behavior as you see fit. Currently, it only reads in the contents of a file and creates the corresponding linked list. You can then call your functions on that list, print results, etc. to verify the correctness of your implementation. You must update your Makefile with a target llrec-test that will compile the necessary code in the various source files including llrec-test.cpp into an executable named llrec-test. The Makefile will be used by the autochecker so if it doesn’t work then your assignment will not be autograded correctly. Once you have compiled your test program, you can run it and provide an input file. See an example below: ./llrec-test llrec-test1.in We have provided one input test file, llrec-test1.in that you can use. Feel free to create other input files and use that as input. (It would be appropriate to add/commit/push those files if you create them). Note: We will not grade your llrec-test.cpp or any input files you create. They are SOLELY for your own benefit to test your code. After submission, we will test your code with our own full test suite and assign points based on those tests. We ask that you NOT SHARE your test program or input files with other students. We want everyone to go through the exercise of considering what cases to test and then implementing those tests. Related Videos A video overview of how these functions may work and be organized is available here. Next Current layout: 1 Panel with tree 5. Templated Stack Implement a templated Stack class, Stack. It must: ● inherit from std::vector and you need to choose whether public or private inheritance is the best choice. Though composition would generally be preferable since a Stack is not truly a vector, we want you to practice with templates and inheritance. ● Support the following operations with the given signatures (see header file) Stack(); size_t size() const; bool empty() const; void push(const T& item); void pop(); T const & top() const; ● ● If top() or pop() is called when the stack is empty, you must throw std::underflow_error defined in . ● All operations must run in O(1) time. Failure to meet this requirement will result in MAJORITY of credit being deducted for this problem. ● Important Note: To call a base class function that has the same name you cannot use this->member_func() since both classes have that function and it will default to the derived version and lead to an infinite recursion. Instead, scope the call (e.g. LList::size()). ● It would probably be a good idea to write a very simple test program (e.g. stack_test.cpp) just to ensure your code can pass some basic tests. We will not grade or require separate stack tests, but there are stack tests in the test suite. Next Current layout: 1 Panel with tree 6. Functors The following is background info that will help you understand how to do the next step. If you saw the following: int x = f(); You’d think f is a function. But with the magic of operator overloading, we can make f an object and f() a member function call to operator() of the instance, f as shown in the following code: struct RandObjGen { int operator() { return rand(); } }; RandObjGen f; int r = f(); // translates to f.operator() which returns a random number by calling rand() An object that overloads the operator() is called a functor and they are widely used in C++ STL to provide a kind of polymorphism. We will use functors to make a sort algorithm be able to use different sorting criteria (e.g., if we are sorting strings, we could sort either lexicographically/alphabetically or by length of string). To do so, we supply a functor object that implements the different comparison approach. struct AlphaStrComp { bool operator()(const string& lhs, const string& rhs) { // Uses string’s built in operator< // to do lexicographic (alphabetical) comparison return lhs < rhs; } }; struct LengthStrComp { bool operator()(const string& lhs, const string& rhs) { // Uses string’s built in operator< // to do lexicographic (alphabetical) comparison return lhs.size() < rhs.size(); } }; string s1 = “Blue”; string s2 = “Red”; AlphaStrComp comp1; LengthStrComp comp2; cout

[SOLVED] Csci104 assignment 1 2. recursive linked list: split (weight 0.5)

Spring 20252. Recursive Linked List: Split (weight 0.5)Overview Write a recursive function to split the elements of a sorted (in increasing order), singly-linked lists of integers into two sorted, singly-linked lists, where the first list contains all items with an odd value, and the second list contains all items with an even value. The original list should not be preserved (see below). Your function must be recursive – you will get NO credit if you use for, while, do while, or goto. If you use helper functions – which you may – then they all must also be recursive. ● You should use the following Node type: struct Node { int value; Node *next; }; ● Here is the function you should implement: void split (Node*& in, Node*& odds, Node*& evens); These are prototyped in split.h for you which you can #include to your split.cpp and test file. You MAY NOT change the definitions provided in this file. ● Empty lists are represented by NULL . You may assume that odds and evens are both NULL when split is called from the main function. ● When your split function terminates, in should be set to NULL (the original list is not preserved), odds should point to the head of a linked list containing all items where value is an odd integer, and evens should point to the head of a linked list containing all items where value is an even integer. Obviously, your solution must not leak memory. Use valgrind to verify correct memory handling and cleanup. Hint: by far the easiest way to make this work is to not delete or new nodes, but just to change the pointers. Testing You will need to test the coding questions yourself with your own test programs. This should cause you to a.) appreciate the importance of testing b.) consider the kinds of test cases you should write (i.e. if none of your test cases exercise a particular set of code in your implementation then you probably need to write more tests) Unset Unset c.) What common tasks related to testing would be useful to reuse and why there are testing frameworks like the one we will use in this class, gtest. (Don’t worry, we’ll cover gtest in lab soon!) While we will only test your split function, you will probably want to write some main code to actually test it. There is a file called test_split.cpp. That file also includes #include “split.h” to bring in the prototype and Node definition. Then you can write a main that instantiates and fills some linked list cases (up to you to do) and then calls split to test its behavior. You must commit this file to your GitHub repo, however it will not be graded. If you get an error NULL is not defined in this scope when compiling split.cpp or your test file, try adding #include to your .cpp file where you are using NULL. Your submission should be in a file called split.cpp, and it should only contain your implementation of the function and NO main(). Using Valgrind If you were to compile a program that takes two arguments: $ ./program input.txt output.txt The corresponding Valgrind command would be: $ valgrind –tool=memcheck –leak-check=yes ./program input.txt output.txt Unset Unset Unset Run the following commands in a terminal (the working directory should be hw1) Compiles without warnings: g++ -g -Wall -std=c++11 -c split.cpp g++ -g -Wall -std=c++11 test_split.cpp split.cpp -o test_split Ensure no Valgrind errors: valgrind –tool=memcheck –leak-check=yes ./test_split Testing with our tests We have provided a few files in the same directory that we will use to grade this code. These tests use the Gtest framework (more on that in lab), but it is fairly easy to use. To run the test code: bash ./grade_split.sh This will compile your split.cpp with our test code and run 12 tests. One time the test will be run normally, and a second time the test will be run with Unset Unset Unset valgrind. A lot of information will be output, but the last few lines will tell you how many tests failed for both the regular execution and with valgrind. If you want to compile and run the test code: g++ split.cpp grade_split.cpp -o grade_split `pkg-config –cflags –libs gtest` ./grade_split Doing it this way will output exactly which tests failed. You can look at the code in grade_split.cpp to see what each test is running. To run a single test you can do: ./grade_split –gtest_filter=Test.Name Test.Name is the name of the test found in grade_split.cpp. For example, there is a test named Split, AllOddsOneEven, you can run just that single test with: ./grade_split –gtest_filter=Split.AllOddsOneEven If you need to see if a single test is failing valgrind (substitute with a real test name): Unset valgrind –tool=memcheck –leak-check=yes ./grade_split –gtest_filter=Test.Name Once your code is working well, make sure to push it to Github: 1. Run git status to see which files have been modified. 2. Run git add on each modified file. Don’t add any object files or executables, just the code you’ve been working on. 3. Run git commit -m where you replace with a short comment like “all done with part 1” 4. Run git push origin main 5. 3. Data structure: unrolled linked list (weight 0.5) Understanding an Unrolled Linked List An unrolled linked list, is a normal linked list (doubly-linked in this case) but each node/item does not store a single data value but an array of values. The head and tail nodes of the linked list may have arrays that are not fully occupied so we keep first and last index to indicate where the first actual data item exists in the array (this index is inclusive) and the last data item exists (this index is exclusive and points to one beyond the last value). These arrays provide better underlying memory performance in most computers (due to caching effects that you’ll learn about in CS 356 or EE 457) and can be more space efficient. In the image above we see each Item struct has a next and prev pointer as would be typical in a doubly-linked list. Then, rather than a single value, it will contain an array of a fixed size where multiple items can be placed. To track which items are used a pair of indices is used of the form: [first, last) where first is inclusive and is the index of the first used item and last is the index 1 beyond the last used index. This approach allows more natural iteration and allows computing the number of items in the range through simple subtraction (i.e. last-first). As an example, first=last=0 indicates no items are used and first=0 and last=10 indicates the 10 elements are occupied (from indices 0..9). To track the head Item, tail Item, and size of the linked list (i.e. number of strings stored in the entire list), the head_, tail_ and size_ members of the ULListStr class are used, respectively. The unrolled list we implement will store strings. For the sake of this homework, we will only ask you to implement the ability to add or remove a value from the front or back of the list (and not in the middle of the list). Each of these operations should run in time O(1). Pushing to the front or back should NOT require moving any values. When pushing to the front, only allocate a new Item if the current head Item has no room before the first Item. When removing an item, only deallocate an Item when the number of used values in its array reaches 0. This means there should not be “empty” C/C++ nodes in the list…when no more array entries of an Item are used, deallocate the Item. 1. You need to examine the code provided in ulliststr.h and ulliststr.cpp and add the implementations for push_back, push_front, pop_back, pop_front, back, front and getValAtLoc in ulliststr.cpp. ○ Below is an example sequence of options: ○ ULListStr dat; dat.push_back(7); dat.push_front(8); dat.push_back(9); cout

[SOLVED] Csci104 assignment #2

Spring 2025In this project you will write code to model a very simplified version of an online retail system such as Amazon. In this project you will read a database of products from certain categories as well as users and their associated information. Your program will then allow a user to interactively search for products based on certain keywords returning the products that match the search. From those matches, your program will allow a user to add select items to their “cart”, view items in their cart, purchase the items in their cart, and save the updated database of product and user information. Important: In practice, reading and understanding others’ code is just as important as writing your own code. So in this project you will need to read and understand a good bit of provided code. Spend time understanding what it does and its structure. One common need when reading large code bases (in the grand scheme this is not that large of a code base) is to find where classes or functions are defined that you see being called or used. Most editors have the ability to do Unset Unset Unset this via some Find in files or Goto definition related feature(s). In Codio you can select Find > Find in Project to search all your files. Another simple command line tool is grep. At the command prompt you can type: grep “search phrase” file(s) to find all occurrences of search phrase in the listed files (often replaced with the wildcard, *). Thus: grep “Product” * would output the lines of text from any file in the current directory that contained Product. An additional option is -n to show line numbers. grep -n “Product” * would include the line numbers in each file where the search phrase occurs. The Data and Its Format Your online retail system will sell products of various categories. All products (no matter the type) will have a: Unset ● Name ● Price ● Quantity in stock Your system will support 3 categories of products by adding a and each category will supply additional fields of data as indicated below: ● book ○ ISBN ○ Author ● clothing ○ Size ○ Brand ● movie ○ Genre ○ Rating Note: ISBN, Author, Size, Brand, Genre, and Rating should all be string type data. The program will also support a set of known users with a: ● username : string ● credit_amount : double ● type : integer (identifying special users like “prime” users…may be used in later HWs) This information will be stored and can be accessed from a text database file. product_category name price Unset quantity category-specific-info product_category name price quantity category-specific-info … username credit_amount type username credit_amount type … An example is shown here of a sample file we will provide database.txt. book Data Abstraction & Problem Solving with C++ 79.99 20 978-013292372-9 Carrano and Henry book Great Men and Women of Troy 19.50 5 978-000000000-1 Tommy Trojan clothing Men’s Fitted Shirt 39.99 25 Medium J. Crew movie Hidden Figures DVD 17.99 1 Drama PG aturing 100.00 0 johnvn 50.00 adal 120.00 1 Next Provided Code Here is a list of the files in the codebase that we are providing. Please do not alter any of the files marked Complete! below. ● amazon.cpp – Incomplete! Top-level application. Contains main() ● datastore.h – Complete! Abstract base class. You will create a derived class which should support storage of all the data: products and users ● database.txt – Input file to your program. You may add more products and users to test your code, or write other input files. These are just text files and can be named however you like. ● Makefile – Partially complete! Edit as you add or change code Storage ● user.h and user.cpp – Complete! Class to model a User ● product.h and product.cpp – Complete! Abstract base class. Models the common aspects of all categories of products. Should support various common operations on all products. Ignore and do not alter isMatch() for this assignment. Parsing ● db_parser.h and db_parser.cpp – Complete! A Parser class which utilizes specialized product parsers. ● product_parser.h – Complete! In this one file are several class definitions. The base class, ProductParser is meant to parse the common attributes of a product and then there is one derived parser class per category of product to parse the additional attributes ● product_parser.cpp – Nearly Complete! The code for the base class ProductParser is complete and does not need to be modified. For each of the derived types you will need to complete the makeProduct() member function of each of these classes to instantiate an appropriate product object for the given category. Utility code ● util.h and util.cpp – Incomplete – You need to complete the code in util.h to find the set intersection and set union. You also need to complete the function to parse a string containing spaces and words into individual words ● Keywords – Your system must build an index mapping keywords to the set of products that match that keyword. A product should match a keyword if it appears in the product name or one of the following attributes below (dependent on specific type of product): ● Books: the words in the author’s name should be searchable keywords as well as the book’s ISBN number ● Clothing: the words in the brand should be searchable keywords ● Movie: the movie’s genre should be a searchable keyword ● For the product name, book author, and clothing brand we define a keyword to be any string of 2 or more characters. If such a word has punctuation it should be split at each punctuation character and the resulting substrings (of 2 or more characters) should be used as keywords. Here are some examples: ○ Men’s should yield just a keyword of Men ○ J. would not yield any keyword since the remaining substring J is only 1 character ○ I’ll would yield just ll since that substring is 2 or more characters (this is obviously a poor keyword but we’ll follow this rule for simplicity) ● For other keywords (book ISBN and movie genre) no punctuation or size analysis is necessary and it should be used verbatim as a keyword. Here is an example: ○ The ISBN 978-000000000-1 should be used exactly as is for the keyword entry ● It is suggested you store your keywords in a common case so that searching is easy and case-insensitive AND/OR Search – Your system should allow users to search for products based on entering one or more keywords at the program menu prompt. An AND search should return all the products that contain ALL the search terms entered. An OR search is defined as all the products that contain ANY of the search terms entered. At the prompt the user will need to write AND or OR as their first word/command followed by any number of search terms separated by spaced. Your search should treat those terms as case-insensitive when it comes to matching. Examples might be: AND Men would be the same as OR Men since there is only 1 term and would return all products that have the word men. (i.e. the book Great Men and Women of Troy and Men’s Fitted Shirt). AND hidden Data would return nothing since no products have both those terms OR hidden Data would return both the Hidden Figures DVD and Data Abstraction & Problem Solving with C++ products You may choose any reasonable behavior if the search consists only of AND or OR (no keywords) 1. Your search must be implemented “efficiently”. You should not have to iterate over ALL products to find the appropriate matches. Some kind of mapping between keywords and products that match that keyword should be implemented. 2. Hits – Results must be displayed to the user via the displayProducts(vector& hits); function provided in amazon.cpp. Failure to use this function will result in LARGE deductions since it will make our testing much harder. 3. Adding to Carts – You should support the notion of a “cart” for each user that they can add products to. Using the ADD username hit_result_index command should cause the product with index hit_result_index from the previous search result to be added to username’s cart (case insensitive). You must maintain the cart in FIFO (first-in, first-out) order though that doesn’t mean you HAVE TO use the C++ queue class. Currently, we will not support the ability to remove products from a cart. If a product is added to a cart twice, treat them as separate items and store them in your cart twice (i.e. don’t try to store it once with a “quantity” of 2). This implies that each command of ADD adds 1 product to the CART. If the username or hit_result_index is either not provided, or invalid, print Invalid request to the screen and do not process the command. Note: The results from the last search should be retained until a new search is performed. Thus, the hits from one search can be referenced by many ADD commands. 4. Viewing to Carts – You should support the VIEWCART username command which should print the products in username’s cart (case insensitive) at the current time. The items should be printed with some ascending index number so it is easy to tell how many items are in the cart. If the username provided is invalid, print Invalid username to the screen and do not process the command. 5. Buying the cart – You should support the BUYCART username command which should cause the program to iterate through the items in username’s cart (case insensitive). If the item is in stock AND the user has enough money it should be removed from the cart, the in stock quantity reduced by 1, and the product price should be debited from the user’s available credit. If an item is not in stock or the user does not have enough credit, simply leave it in the cart and go on to the next product. Note: Your cart implementation must iterate through the products in the order they were added. If the username provided is invalid, print Invalid username to the screen and do not process the command. 6. Quitting – You should support the QUIT filename command which should cause a new version of the database using the format described above to be saved to a file whose name is filename. It should represent the updated state of the database (i.e. changing product quantities and user credit) to reflect purchases. Note: Within the various sections, users and products may be written in any order (not necessarily matching the order of the input database file). 7. Our code in amazon.cpp and db_parser.cpp can make calls via the DataStore interface by using a base class pointer/reference (DataStore* or DataStore&). It is this class where you will likely want to store products and users, via some kind of container object(s). The parser in DBParser is complete but allows for extensions by “registering” certain “section parsers” and “product parsers”. Section parsers handle everything betweenin the database file. We can create section parsers out in main() and register them with the DBParser which will maintain a map of section name to the given parser. When the parser encounters a particular section it will invoke the appropriate section parser. We have written two sections parsers: ProductSectionParser and UserSectionParser. These are complete. Our product parsers will parse all the aspects of the specific category of product into the data members of the class and then call makeProduct(). It is here that you need to instantiate an appropriate product object and return a pointer to it. It will then be added to the data store object. You only need to fill in the code in the makeProduct() functions for each specific product parser and do not need to change any other code in product_parser.cpp other than adding appropriate #include statements. 1. Complete the parseStringToWords() in util.cpp according to the specification given above for taking a string of many words and splitting them into individual keywords (split at punctuation, with at least 2 character words) 2. Complete the setIntersection and setUnion functions in util.h. These will help you later on to perform searching. These functions should run in O(nlog(n))* and NOT O(n^2). Note that these are templated functions operating on any generic set. As a hint, to declare an iterator for set you must precede the type with the keyword typename as in typename set::iterator it. Another very important note about using iterators with C++ containers (e.g. vector, set, map ): if you are iterating over a container with iterators, you should NOT modify the contents as you iterate. Consider the scenario where you have an iterator to the beginning item of a vector, and in your loop you erase that element. Behind the scenes, the vector shifts all the data elements up a spot, moving the 1st element into the 0th location. When you increment the iterator you will now move to the next location in the vector skipping the 1st element (now in the 0-th location). Similar or more serious issues can arise when you insert items, etc. as you iterate. 3. Write derived product classes: Book, Clothing and Movie implementing the keywords() which returns the appropriate keywords to index the product, displayString() [to create a string that contains the product info] and dump() [that outputs the database format of the product info]. We recommend trying to compile (NOT test, just compile) each of these files as you write them to avoid solving the same compile issue 3 times for each derived class. Remember you can easily compile by using the -c flag (e.g. $ g++ -g -Wall -std=c++11 -c book.cpp ). Each class should be written in its own .h and .cpp files (i.e. book.h, book.cpp, clothing.h, etc.) 4. Complete each of the specialized product parser implementations of makeProduct() in product_parser.cpp to return a new specific Product for each category. Again we recommend ensuring this file can be compiled after you complete it. 5. Implement a derived DataStore class called MyDataStore in mydatastore.h and mydatastore.cpp. It is here that you will implement the core functionality of your program: searching, adding products and users, saving the database, etc. (For search you can use the setIntersection and setUnion) functions in util.h. This class is likely where you should store products and users in some fashion. Again we recommend compiling this file separately as well after you write the core functionality. You may need to add to it or modify it later as you work through other design aspects but make sure it can compile now even just using empty “dummy” function implementations. This derived class may define non-virtual functions to do other specific commands that the menu supports. It might be a good idea to have one member function in this class that corresponds to and performs each command from the menu options. You should not modify datastore.h. 6. Complete amazon.cpp. It has a pretty good skeleton laid out for you to implement the user interface (text-based menu and command entry) and you only need to modify a few lines in the top area and then add the remaining menu option checks at the bottom. More specifically, you will need to: ● Change the DataStore object instantiation to your derived type ● Add checks for other menu input options, read their “arguments” and implement the desired behaviors. ● You should not need to modify the parser calls at the top. 1. Update the Makefile as needed. Remember we never compile .h files…those just get #included into the .cpp files that we actually compile. 2. Be sure you have no memory leaks. 3. You may NOT use any additional (we use a few in the code provided) algorithms from the library nor may you use the auto keyword. Unset Unset Unset Displaying Products When you display the products, displayString() will be used to generate the information string. You should follow the format: ● Books Author: ISBN: left. ● Clothing Size: Brand: left. ● Movies Genre: Rating: left. Test your Program Unset We strongly recommend writing separate test drivers programs (i.e. separate .cpp files with a main()) that perform basic unit tests by calling various functions or instantiating your classes and invoking the various member functions. In this way you can have some confidence that the individual pieces work before you try to put them all together. At the point where you need a database file to parse and act upon, you may use the database.txt file. Feel free to add products and users to the database.txt or, better, create your own database text file. Run the program and be sure to test various sequences of command that exercise the requirements described above. 8. (G)testing your amazon For this assignment we have provided a fully working gtest-based test suite that exercises your program thoroughly. It is also the test suite used to grade your program. Unfortunately, the tests will only compile in a Linux environment like Codio or Docker. In order to use the test suite you must set it up first: 1. Open a terminal and ensure you are in the proper directory (hw2_tests). On Codio cd workspace/hw2/hw2_tests. On Docker the command might be: cd hw2/hw2_tests. 2. Run cmake . This will output a lot of information, the last few lines should look like: — Found 35 GTests from amazon_tests — Configuring done — Generating done Unset — Build files have been written to: /home/codio/workspace/hw2/hw2_tests If you do not run cmake . then the tests will not build and/or run correctly 3. Now you need to build the test executable by running the make command. If this fails to build, the problem is caused by your code failing to build. Go back to your hw2 directory and ensure that make amazon runs and builds an amazon executable. 4. Once you can successfully run make in the hw2_tests directory, you can run the tests. First change directories to the amazon_tests directory: cd amazon_tests, then run the tests with ./amazon_tests For this assignment since the program implements a simple text interface the program is tested by running your amazon executable with input coming from a file simulating what a user would type (i.e input redirection). The database file used is also unique to each test. Pay careful attention the to output of the test program as it tells you exactly what is going on. For example here is the output of one test: [ RUN ] ConsoleErrors.AddInvalidHitToCart This test is executing your program with the following command line: /usr/bin/valgrind –tool=memcheck -q –leak-check=yes –error-exitcode=113 /home/codio/workspace/hw2/amazon /home/codio/workspace/hw2/hw2_tests/amazon_tests/testFiles/AddInvali dHitToCart/database.txt Your program’s STDIN was piped from: /home/codio/workspace/hw2/hw2_tests/amazon_tests/testFiles/AddInvali dHitToCart/input.txt Unset Your program’s STDOUT was written to: /home/codio/workspace/hw2/hw2_tests/amazon_tests/testFiles/AddInvali dHitToCart/output.txt Here we see that the program is running the test called ConsoleErrors.AddInvalidHitToCart. We can see the program is being run with valgrind and we can see the path to the database file. The file called out as STDIN is the file that represents what the user would type, whereas everything your program outputs goes to the STDOUT file. Remember, to run one test you can do the following (replace Test.Name with the name of a test): ./amazon_tests –gtest_filter=Test.Name The GTests can be very particular when comparing your output to the reference. If you think your output matches exactly, but the test says it doesn’t you need to look for things like extra or missing spaces/tabs/newlines, extraneous characters, debug output and/or numbers not formatted correctly (e.g a price should be 15.00, not 15 or 15.0) Grading For this assignment the program is graded using the amazon_tests executable with the score being (if N tests are passed) 100*N/35. Valgrind failures are not counted separately, if a test fails valgrind, it counts as failed.