Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Cs 331: theory of computing problem set 8

Problem 1 (20 points) Let A ≤L B mean that A ≤T B with the additional condition that the oracle Turing machine MB that solves A queries the oracle for B only once, at the very last step. Prove that A ≤L B if and only if A ≤m B (10 points for each direction).Problem 2 (20 points) Describe two different Turing machines, M1 and M2, such that, when started on any input, M1 outputs hM2i and M2 outputs hM1i. Hint: Take a look at program SELF.Problem 3 (20 points) For notation simplicity, in this problem, we identify a Turing machine with its encoding. A computable function f : Σ∗ → Σ ∗ is said to be a universal corruptor if for any Turing machine M, f(M) is a Turing machine that behaves differently from M. Formally, L (M) , L (f(M)). Prove that no universal corruptor exists. You need to show that for every purported corruptor f, there exists a Turing machine UNTOUCHABLE such that f(UNTOUCHABLE) and UNTOUCHABLE are equivalent.Problem 4 (20 points) Let SELFTM = {hMi | L (M) = {hMi}}. Prove that neither SELFTM nor SELFTM is Turing-recognizable (10 points for each). Hint: Diagonalization with the help of Recursion Theorem.Problem 5 (20 points) Prove that the class P is closed under union and complementation (10 points for each).

[SOLVED] Cs 331: theory of computing problem set 7

Problem 1 (10 points) Let K = {hMi | hMi < L (M)}. Prove that K is not Turing recognizable. Hint: Russell’s Paradox.Problem 2 (20 points) Prove the following statements. 1 L1 ≤m L2 and L2 ≤m L3 imply L1 ≤m L3. (10 points) 2 L1 ≤T L2 implies that L1 ≤T L2. (10 points) Note: ≤T is Turing reduction (introduced in Chapter 5.1) and ≤m is many-one reduction (introduced in Chapter 5.3).Problem 3 (10 points) Prove that ATM 6≤m ATM. Hint: Proof By Contradiction and Problem 2.Problem 4 (20 points) Which of the following PCP problems has a solution? Justify your answer. (1) ( ab a bb ab aa ba cc bc aa ca d cd ) (2) ( ab a bb ab aa ba c bc aa ca d cd ) 5 / 7 CS 331: Theory of Computing Problem 5 (20 points) Does the following PCP problem P have a solution? ( a ab b ccc c b c d dddd ddde e ) Hint: Prove that P has a solution if and only if ∃n (3 n mod 4) = 3.Problem 6 (20 points) Let Σ = {0, 1, −} (where − denotes whitespace) be the tape alphabet for all TMs in this problem. Define the busy beaver function BB : N → N as follows. For each value of k, consider all k-state TMs that halt when started with a blank tape. Let BB(k) be the maximum number of steps that can be performed by a k-state Turing machine. Show that BB is not a computable function. Hint: Halting Problem.

[SOLVED] Cs 331: theory of computing problem set 6

Problem 1 (20 points) Let Σ = {0, 1, #}. Consider the following language over Σ: L = { w1#w2 | w1, w2 ∈ {0, 1} ∗ and w1 < w2 }. Note: w1 < w2 means w1 is less than w2 when they are both viewed as integers in binary. For example, 1110#11100 ∈ L while 0011#011 < L . Design a Turing machine (in pseudo-code) that recognizes L .Problem 2 (20 points) A 2-dimensional Turing machine is a Turing machine with a 2-dimensional tape that is an unbounded grid of tape squares over which the head can move in 4 directions: left (L), right (R), up (U) and down (D). The tape space is denoted by N × N. The head starts at (0, 0), and is governed by two restrictions: The head never moves down when it is on the bottom row, i.e., in positions (i, 0) for i ∈ N, The head never moves left when it is on the leftmost column, i.e., in positions (0, i) for i ∈ N. Prove that 2-dimensional Turning machines are no more powerful than the standard Turing machines. Note: Your proof need not be completely formal, but it should have sufficient details on how a 2-dimensional Turing machine can simulated by a standard Turing machine step by step. 3 / 6 CS 331: Theory of Computing Problem 3 (20 points) Prove that the class of Turing-recognizable languages are closed under the following operations. (5 points) Union: L = L1 ∪ L2. (5 points) Intersection: L = L1 ∩ L2. (10 points) Concatenation: L = L1 · L2. Note: Your proof should be constructive. That is, given Turing machines TM1 and TM2 that recognize L1 and L2 respectively, your proof should show how to construct a Turing machine TM that recognize L for each operation. 4 / 6 CS 331: Theory of Computing Problem 4 (20 points) Prove the following languages are not Turing-decidable. (10 points) LB = { hMi | M will write “V” somewhere on the tape }. (10 points) LU = { hMi | M halts on all words except one }. 5 / 6 CS 331: Theory of Computing Problem 5 (20 points) We say that a Turing machine is n-bound if its head visits at most n squares on the tape. Is the following language Turing decidable? Prove or disprove it. LS = { hMi | M is |w| ∗ |w|-bound on any input w. }. Note: |w| denotes the length of w.

[SOLVED] Cs 331: theory of computing problem set 5

Problem 1 (20 points) Let Σ = {0, 1}. Consider the following language over Σ: L = { w ∈ Σ ∗ | w contains twice as many 0’s as 1’s }. (10 points) Describe in pseudo-code (as in Example 3.7 in the textbook) of a Turing machine M that recognizes L . (10 points) Formally define M. You may write down a formal definition or draw a diagram like Figure 3.8 in the textbook. You may choose Γ freely. 2 / 6 CS 331: Theory of Computing Problem 2 (20 points) Let Σ = {0, 1} and Γ = {0, 1, , #, x} (where denotes white space). Write a Turing machine M such that given any input w ∈ Σ ∗ , M halts with tape content w#w R . (10 points) Describe M in pseudo-code. (10 points) Formally define M. You may write down a formal definition or draw a diagram like Figure 3.8. Note: we do not care if M halts in accepting state or rejecting state. 3 / 6 CS 331: Theory of Computing Problem 3 (20 points) Let M = hΣ, Γ, Q, q0, δ, qaccept , qrejecti be a Turing machine where Σ = {0, 1}, Γ = {0, 1, } ( denotes white space), Q = {q0, q1, q2, q3, qaccept , qreject}, and δ is represented by the following table. δ 0 1 q0 (q0, 0, R) (q0, 1, R) (q1, , L) q1 (q2, 1, L) (q1, 0, L) (q3, 1, L) q2 (q2, 0, L) (q2, 1, L) (qaccept , , R) q3 − − (qaccept , , R) (10 points) Show the sequence of computation of M when given input 01101. (10 points) What is the functionality of M. Justify your answer. 4 / 6 CS 331: Theory of Computing Problem 4 (20 points) Let L be a language over Σ = {0, 1}. We order finite words in L by length, and for words of the same length, we order them lexicographically. Prove that L is Turing-decidable if and only if L can be enumerated by an enumerator Turing machine in strictly increasing order. Note: Your proof should consists of two parts, each of which is worth 10 points. 5 / 6 CS 331: Theory of Computing Problem 5 (20 points) A 2-PDA is a 6-tuple hQ, Σ, Γ, δ, q0, Fi, where Q, Σ, Γ, and F are all finite sets, and Q is the set of states, Σ is the input alphabet, Γ is the stack alphabet, δ : Q × Σ × Γ × Γ → P(Q × Γ × Γ) is the transition function, q0 ∈ Q is the start state, and F ⊆ Q is the set of accept states. Basically, a 2-PDA is a PDA that operates on two tapes simultaneously. For example, (q 0 , b1, b2) ∈ δ(q, σ, a1, a2) means the machine goes from q to q 0 when it reads letter σ, and the symbol a1 (resp. a2) is on the top of the stack 1 (resp. stack 2). And it replaces a1 (resp. a2) with b1 (resp. b2). Prove that any Turing machine can be simulated by a 2-PDA. Hint: Can a Turing machine configuration be represented by two stacks?

[SOLVED] Cs 331: theory of computing problem set 4

Problem 1 (20 points) Let Σ = {a, b, c}. Define a Context-free Language (CFL) L = { a i b j c i | i, j ≥ 0 }. (10 points) Find a Context-free Grammar (CFG) to describe L . (10 points) Construct a PDA that recognizes L . Draw the state diagram like Figure 2.15 in the textbook. 2 / 6 CS 331: Theory of Computing Problem 2 (20 points) (10 points) Prove that if L is a context-free language and L 0 is a regular language, then L ∩ L 0 is context-free too. (10 points) Let Σ = {a, b, c} and L = { w ∈ Σ ∗ | w contains equal numbers of a’s, b’s, and c’s }. Use the first part to prove that L is not a context-free language. 3 / 6 CS 331: Theory of Computing Problem 3 (20 points) A Right Linear Grammar (RLG) is a context-free grammar such that in each production rule, at most one variable can appear in the right-hand side and such an occurrence can only take place at the right end. For example, the following is an RLG. S → abS | abS | Prove that RLGs recognize exactly the class of regular languages. Your proof should consists of two parts: (1) any regular language can be described by an RLG (10 points), and (2) any language described by an RLG is regular (10 points). Hint: Correspondence between the variables in an RLG and the states of a DFA. 4 / 6 CS 331: Theory of Computing Problem 4 (20 points) Use the pumping lemma to show that the following lanugage is not context-free. { a i b j c k | i, j, k ≥ 0 and i > j and j > k } 5 / 6 CS 331: Theory of Computing Problem 5 (20 points) Let G = h{S}, {a, b}, R, Si be a context-free grammar where R consists of the following production rules: S → aS | aSbS | (5 points) Prove that G is ambiguous. (15 points) Give an unambiguous grammar that generates the same language as G does. Hint: Use precedence as in Example 2.4 in the textbook.

[SOLVED] Cs 331: theory of computing problem set 3

Problem 1 (20 points) If L is any language, let L1 2 − be the set of all first halves of strings in L so that L1 2 − = { w | for some w0 , |w| = |w 0 | & ww0 ∈ L }. Prove that, if L is regular, then so is L1 2 − . Hint: Meet In The Middle. Let A, B be automata such that L (A) = L and L (B) = L R (L R is defined as in Problem 2 in HW2). Use A and B to construct an automaton C that recognizes L1 2 − . 2 / 6 CS 331: Theory of Computing Problem 2 (20 points) A universal finite automaton (UFA) A is a 5-tuple hQ, Σ, Q0, δ, Fi (syntactically just like an NFA) that accepts a word w if every run of A over w ends in F. Note that an NFA accepts a word if there exists a run of A over w that ends in F. Prove that UFAs recognize the class of regular languages, that is, a language L is recognized by a UFA if and only if L is regular. Hint: Given a UFA A, can you construct an NFA that recognizes L (A), the complement of L (A)? 3 / 6 CS 331: Theory of Computing Problem 3 (20 points) Let Σ = {0, 1}. A palindrome is a word that reads the same forward and backward. For example, “ABLE WAS I ERE I SAW ELBA” is palindromic. Prove that the following language { w ∈ Σ ∗ | w is not a palindrome } is not regular. 4 / 6 CS 331: Theory of Computing Problem 4 (20 points) Let k > 1 and Lk = {, a, aa, . . . , a k−2 }. Prove the following statements: 1 Lk can be recognized by a DFA with k states. (5 points) 2 Lk cannot be recognized by any DFA with k − 1 states. (15 points) Hint: Use the pigeonhole principle as in the proof of the Pumping Lemma to prove Part 2. 5 / 6 CS 331: Theory of Computing Problem 5 (20 points) Let Ln = {w ∈ {0, 1} ∗ | the n-th letter of w from the end is 1 } for n ≥ 1. Prove that an NFA with n + 1 recognizes Ln (5 points). any DFA that recognizes Ln needs at least 2n states (15 points). Hint: Let Sn be the set of all words of length n. Show that any DFA A that recognizes Ln should have a distinctive state qw for each w ∈ Sn such that whenever A reaches qw, w is the suffix of the word that has been read by A. 6 / 6 CS 331: Theory of Computing

[SOLVED] Cs 331: theory of computing problem set 2

Problem 1 (20 points) Let Σ = {0, 1}. Construct a DFA A that recognizes the language that consists of all binary numbers that can be divided by 5. For example, A should accept 101, 1010, 1111 etc, and should reject 010, 111, 1110 etc. Justify your answer. Note: Your DFA should have no more than 5 states. 2 / 6 CS 331: Theory of Computing Problem 2 (20 points) Let w = σ1 · · ·σn be a word on an alphabet Σ. By w R we mean the word σn · · ·σ1. Define L R such that L R = { w ∈ Σ ∗ | w R ∈ L }. Prove that if L is regular, then so is L R. 3 / 6 CS 331: Theory of Computing Problem 3 (20 points) Let Σ = {0, 1}. We use kwkw0 to denote the number of occurrences of the subwords w 0 in w. Construct a DFA that recognizes the following language: { w ∈ Σ ∗ | kwk10 = kwk01 } . 4 / 6 CS 331: Theory of Computing Problem 4 (20 points) For language L1 and L2, let the imperfect shuffle of L1 and L2 be the language { w ∈ Σ ∗ | w = w1v1 · · · wk vk , w1 · · · wk ∈ L1, v1 · · · vk ∈ L2, wi , vi ∈ Σ ∗ } . Prove that the class of regular languages is closed under imperfect shuffle. 5 / 6 CS 331: Theory of Computing Problem 5 (20 points) An NFA A = (Q, Σ, q0, δ, F) is called co-deterministic (CDFA) if for any a ∈ Σ and any two distinct states q and q 0 , δ(q, a) ∩ δ(q 0 , a) = ∅. Prove or disprove the following statements. Every CDFA is a DFA. (5 points) Every DFA is a CDFA. (5 points) Every NFA can be converted to an equivalent CDFA. (10 points) Hint: A CDFA can be viewed as a DFA as if the word is read from the end to the beginning. Your solution to Problem 2 may help.

[SOLVED] Cs 331: theory of computing problem set 1

Problem 1 (15 points) Use Proof By Contradiction to show that for any k ≥ 2, √k 2 is an irrational number.Problem 2 (15 points) The depth (or height) of a tree is the number of edges along the path from the root node to the deepest leaf node. A full binary tree is a tree in which every node other than the leaves has two children. A full binary tree is called perfect if all leaves are at the same depth. Note that the single root is a perfect binary tree with depth 0. Use Proof By Induction to show that for every n ≥ 0, a depth n perfect binary tree has 2n+1 − 1 nodes. 3 / 7 CS 331: Theory of Computing Problem 3 (20 points) A truth assignment M is a function that maps propositional variables to {0, 1} (1 for true and 0 for false). We write M |= ϕ if ϕ is true under M. We define a partial order ≤ on truth assignments such that M ≤ M0 if M(p) ≤ M0 (p) for every propositional variable p. A propositional formula is positive if it only contains connectives ∧ (and) and ∨ (or) (i.e., no negation ¬ or implication →). Use Proof By Induction to show that for any truth assignments M and M0 such that M ≤ M0 , and any positive propositional formula ϕ, if M |= ϕ, then M0 |= ϕ. 4 / 7 CS 331: Theory of Computing Problem 4 (10 points) Let Σ = {0, 1}. What language is defined by the following regular expression? Describe that in one or two sentences. 1 (5 points) Σ ∗ 0 Σ ∗ 1 Σ ∗ . 2 (5 points) 0 0∗ 1 ∗ . 5 / 7 CS 331: Theory of Computing Problem 5 (20 points) Let Σ = {0, 1, e, E, +, −}. Construct a deterministic finite automaton that recognizes words of the following form (0 ∪ 1)(0 ∪ 1) ∗ (e ∪ E)(+ ∪ −)(0 ∪ 1)(0 ∪ 1) ∗ Note that “(” and “)” are only for readability; they are not part of the regular expression. 6 / 7 CS 331: Theory of Computing Problem 6 (20 points) Let A = hQ, Σ, δ, q0, Fi and B = hQ, Σ, δ, q0, F 0 i be two nondeterministic finite automata such that F 0 = Q F. In other words, the only difference between A and B is the final state set; a state is final in B if and only if it is not in A. Prove or disprove: The language L (B) is the complement of the language L (A), i.e., L (B) = Σ∗ L (A). Note that if the statement is true, you must provide a formal proof, and if the statement is false, then all you need to do is describe a counterexample.

[SOLVED] Csci 6430 parallel processing project 4

Enhance your MPI library to support the following functions:int MPI_Isend(void*, int, MPI_Datatype, int, int, MPI_Comm, MPI_Request *); int MPI_Irecv(void*, int, MPI_Datatype, int, int, MPI_Comm, MPI_Request *); int MPI_Test(MPI_Request *, int *, MPI_Status *); int MPI_Wait(MPI_Request *, MPI_Status *);As in p3, place your object code into a library named libpp.a which test programs can link against.Have a makefile that will build libpp.a and compile and link all test programs that we develop.

[SOLVED] Csci 6430 parallel processing project 5

Enhance your MPI library to support the following:int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm); int MPI_Op_create(MPI_User_function *user_fn, int commute, MPI_Op *op);MPI_Reduce only has to support MPI_INT as the data type.MPI_Reduce is only required to support MPI_SUM as its an available *built-in* op. However, it should also provide the possibility for the user to provide a user-defined op by creating one with MPI_Op_create. The commute argument will not be relevant for our purposes.

[SOLVED] Csci 6430 parallel processing project 3

Enhance your MPI library to support the following functions:double MPI_Wtime(void); int MPI_Comm_dup(MPI_Comm, MPI_Comm *); int MPI_Bcast(void*, int, MPI_Datatype, int, MPI_Comm ); int MPI_Gather(void* , int, MPI_Datatype, void*, int, MPI_Datatype, int, MPI_Comm);Place your object code into a library named libpp.a which our test programs can link against.Have a makefile that will build libpp.a and compile and link all test programs that we develop for this project.

[SOLVED] Csci 6430 parallel processing project 2

Enhance p1 by supplying a port number to each of the ranks so they can use it in MPI_Init etc. It can be used to connect back to the process running mpiexec and retrieve information such as how to contact other ranks.Then, implement your own MPI that supports the following functions:MPI_Comm_rank from env MPI_Comm_size from env MPI_Finalize fork parent handles this MPI_Init start here, with calling socket. send rank sockets back to calling machine sin.sin_port = htons(port); –> htons(0); set port to 0, and OS will choose one, then use getsockname() to find port i am assigned 4401-4499. or can use any ports from 0-3999,6001-65000rank, size, host from env get accept(listen) socket – set port — connect to host return MPI_SUCCESSppexec accepts connections in a loop — save host and port info from sender send info about all ranks to all ranks?hosts have to know the cwd – in mpiexec: getcwd() -> into env var -> send to host –> cd $PWD fix ppexec for looking for child process endMPI_Send (just wrapper for Ssend) validate args (including comm) get connection to dest (if don’t have) connect to dest send msg (char or int) progress engine!! – could handle multiple things at once (ie receives) send is done when buffer is reusable by the user read MPI standards: advice to users/implementors … could send header first, and then send buffer in PE()MPI_Ssend MPI_Recv MPI_Barrier no one leaves until we’re all hereprogress engine: is there something to do? // in separate thread? whatever, whenever // as uni-process after every MPI call select system call nfds – FD_SETSIZE timeout – NULL: indef, 0: poll, !=0: =hang time EOF shows something thereremind dr butler about test fileslibrary blahblah in 6430rmb/DEMOS/LIBhttps://www.cs.mtsu.edu/~rbutler/courses/pp6330/pp.htmlYou only need to support MPI_COMM_WORLD for this project.Place your object code into a library named libpp.a which test files can link against.Compile and link the program previously named p1 as ppexec.Have a makefile that will build libpp.a and compile and link ppexec as well as any test programs that we develop.I will cd to my dir containing your project files and type: rm -f libpp.a rm -f *.o makeUse turnin as in p1.

[SOLVED] Csci 6430 parallel processing project 1

Write a version of mpiexec and any supporting programs that you need to run MPI programs which use your own implementation of MPI. Your mpiexec should be able to run non-MPI programs as well. It should compile and link as p1.p1 will find valid hostnames in a file named hostnames. This name may be overridden by using the -f cmd-line arg. Also support -n to specify the number of ranks, which defaults to 1. If the file named hostnames is not in the local directory, and the -f option is not present, all ranks are to be started on the local host. If the arg to -n specifies more ranks than in the hostnames file, then loop in that file. All ranks should have these 3 variables in their environments: PP_MPI_RANK # rank of the process PP_MPI_SIZE # number of ranks started PP_MPI_HOST_PORT # hostname:9999 ; hostname is host where p1 is run The value of the PORT portion of HOST_PORT will change in later projects. Make sure that stdout/stderr for ranks are printed on the controlling tty for p1. Note that fork and exec of ssh will route output back to the controlling tty, and you have all pids from forks, so that you can wait for all ranks to terminate at the end, so, this is an attractive approach, but you may write you own (proxy) server code if you prefer. Use turnin to submit a tar file of a directory with makefile and all source code to build your p1. After un-tarring your tar file, I will simply run: rm -f p1 rm -f *.o make to build p1 before testing.

[SOLVED] Cmsc 25400 homework 6

In this assignment you will code a Long-Short Term Memory (LSTM) recurrent network in PyTorch to predict words in English language sentences taken from a story (some words, such as names, in the story have been modified). The network should have a single LSTM layer, and by default the ht hidden vectors should be of dimension 200 (you may experiment with shorter or longer vectors). The ct cell vectors must be of the same dimension. You can set the input and output vectors to be also 200 dimensional, however size of the vocabulary, nvocab is of course much greater than 200. To bridge this gap, try two different strategies: 1. Use a random word embedding, i.e., just assign a uniformly distributed random vector vi ∈ [0, 1]200 to each word in the vocabulary. 2. Use a pre-computed embedding, as described on Mart´ın Pellarolo’s blog. Please compare the accuracy of these two approaches. Dataset The data is provided in four files: ◦ bobsue.seq2seq.train.tsv is the training file consisting of 6036 pairs of consecutive sentences from the story separated by a tab. ◦ bobsue.seq2seq.dev.tsv is a validation set consisting of similar paired sentences that you can use to set parameters of the model and decide when to stop training (to avoid overfitting). ◦ bobsue.seq2seq.test.tsv is a test set of similar pairs of sentences. ◦ bobsue.voc.txt contains the vocabulary of the above files, with one word per line. Each sentence in the training/validation/testing sets is enclosed between tokens. Task Your task is to learn to predict the second sentence of each pair, word by word, from the first sentence. In training, define the loss on each word as the squared distance ∥ v −vb∥ 2 between the predicted word vector and the vector corresponding to the correct word in the embedding space. At test time, just predict the word in the vocabulary whose vector is closest to the predicted word vector. You can ignore the token, but please treat the token as a word (or maybe an extra dimension of your output space), since predicting where the second sentence ends is highly significant. Report the accuracy of your network on the test set as a function of various design parameters, and also plot how the loss on the training and validation sets changes as a function of training epochs. Report the 20 words that your network most often got right and the 20 words that it least often got right. Also include a sample of sentences produced by your network and the corresponding ground truth sentences. Submit the predictions of your network in a text file similar to the training/validation/testing files. In this assignment you are expected to use Pytorch’s automatic differentiation functionality to compute all the gradients for you. You may use one of PyTorch’s built-in training algorithms too (sugh as SGD etc) to update the weights. However, please do not use the pre-made LSTM class. An excellent tutrial on LSTMs can be found on Chris Olah’s blog.

[SOLVED] Cmsc 25400 homework 3

1. In class we have derived an explicit formula for least squares regression when the hypothesis class is the class of linear functions h(x) = θ0 + θ1×1 + θ2×2 + . . . θdxd. However, not all data can be well modeled by just a linear equation. An alternative, generally richer hypothesis class can be defined by defining a set of n basis functions {ϕ1(x), . . . , ϕn(x)}, and considering regressors of the form h(x) = ∑n i=1 θi ϕi(x). Now each θi parameter is the coefficient of the corresponding ϕi in the linear combination of basis functions. The {ϕi} can really be any collection of functions, but as an illustration in the d = 1 case you might consider something like a family of Gaussian bumps ϕi = e (x−i) 2/(2σ 2 ) i = 1, 2, . . . , n. In principle one could also consider a infinite sequence of basis functions, but for simplicity here we only consider a finite set. As before, let the loss function be the squared error loss J(θ) = 1 2 ∑m j=1 (h(xj ) − yj ) 2 . Show that similarly to the linear case, the optimal solution can be found in the form θ = (A ⊤A) −1A ⊤⃗y, where ⃗y = (y1, . . . , ym) ⊤, and derive the form of the matrix A. This problem illustrates that the least squares technique (including its SGD version) has much broader applicability than just linear regression. 2. In class we have derived that given data {(x1, y1),(x2, y2), . . . ,(xm, ym)}, the log-likelihood for logistic regression is ℓ(θ) = ∑m i=1 [ ui log(h(xi)) + (1 − ui) log(1 − h(xi))] , (1) where h(x) is the logistic function h(x) = 1 1 + e−θ·x = g(θ · x) g(z) = 1 1 + e−z , and the ui ’s are just the 0/1 analogs of the yi ’s, i.e., ui = (1 + yi)/2. There is no closed form solution for the MLE of logistic regression. (a) For simplicity consider (1) for a single data point (x, u). Derive the form of the gradient ∇ℓ(θ). The formula g ′ (z) = g(z) (1 − g(z)) that we found in class might come in handy. (b) Conclude that the SGD step based on a single datapoint (xi , ui) in the dataset is θ ← θ − α [ (h(xi) − ui) xi ] . 3. An online algorithm is said to be conservative if it changes its hypothesis only when it makes a mistake. Let C be a concept class and A be a (not necessarily conservative) online algorithm which has a finite mistake bound M on C. Prove that there is a conservative algorithm A′ for C which also has mistake bound M. 1 4. Recall that the k–class perceptron maintains k separate weight vectors w1, w2, . . . , wk, and predicts yb = arg max i∈{1,2,…,k} (wi · x). If this prediction is incorrect, and the correct label should have been y, it updates the weights by setting wy ← wy + x/2 wyb ← wyb − x/2. Let {(x1, y1),(x2, y2), . . .} be the training data. Assume that ∥ xt ∥ = 1 for all t, and that this dataset is separable with a margin δ, which in this case means that there exist unit vectors v1, v2, . . . , vk such that for each example (xt, yt) vyt · xt − vy · xt ≥ 2δ y ∈ {1, 2, . . . , k} {yt} . (a) Show that in the k = 2 case this notion of margin is equivalent to the margin that we saw in class. (b) In the k = 2 case we saw that the number of mistakes that the perceptron can make is upper bounded by 1/δ2 . Derive a similar bound for the k = 3 case. Hint: Two quantities that you may wish to consider are a = v1 · w1 + v2 · w2 + v3 · w3 and b = ∥ w1 ∥ 2 + ∥ w2 ∥ 2 + ∥ w3 ∥ 2 . Part of your derivation might involve showing that a ≤ 3 √ b. 5. The file train35.digits contains 2000 images of 3’s and 5’s from the famous MNIST database of handwritten digits in text format. The size of each image is 28 × 28 pixels. Each row of the file is a representation one image, with the 28 × 28 pixels flattened into a vector of size 784. A value of 1 for a pixel represents black, and value of 0 represents white. The corresponding row of train35.labels is the class label: +1 for the digit 3, or −1 for the digit 5. The file test35.digits contains 200 testing images in the same format as train35.digits. Implement the perceptron algorithm and use it to label each test image in test35.digits. Submit the predicted labels in a file named test35.predictions. In the lectures, the perceptron was presented as an online algorithm. To use the perceptron as a batch algorithm, train it by simply feeding it the training set M times. The value of M can be expected to be less than 10, and should be set by cross validation. Naturally, in this context, the “mistakes” made during training are not really errors. Nonetheless, it is intructive to see how the frequency of mistakes decreases as the hypothesis improves. Include in your write-up a plot of the cumulative number of “mistakes” as a function of the number of examples seen. Since the data is fairly large, for debugging purposes it might be helpful to run your code on just subsets of the 2000 training test images. Also, it may be helpful to normalize each example to unit norm. 2

[SOLVED] Cmsc25400 homework 2

1. (10 points) (a) Let v1, v2, . . . , vd be d mutually orthogonal unit vectors in R d (i.e., a basis for R d ), let λ1, . . . , λd be real numbers, and A = ∑ d i=1 λiviv ⊤ i . (1) Show that v1, . . . , vd are eigenvectors of A with corresponding eigenvalues λ1, . . . , λd. (b) Conversely, show that if v1, . . . , vd is an orthonormal system of eigenvectors and λ1, . . . , λd are the corresponding eigenvalues of a symmetric matrix A ∈ R d×d , then A is of the form (1). 2. (20 points) Let A ∈ R d×d be a symmetric matrix, and for simplicity assume that all its eigenvalues are positive and distinct, 0 < λ1 < λ2 < . . . < λd. Let v1, v2, . . . , vn be the corresponding (normalized) eigenvectors. (a) Prove that any two of the eigenvectors vi and vj (assuming i ̸= j) are orthogonal (you may wish to compare viAvj and vjAvi). (b) Explain why this implies that v1, v2, . . . , vd form an orthonormal basis for R d . (c) The Rayleigh quotient of A is R(w) = w⊤Aw w⊤w w ∈ R n . Prove that the maximum of R(w) is λd, and that the maximum is attained at w = vd. 3. (20 points) Recall that the empirical covariance matrix (sample covariance matrix) of a dataset {x1, x2, . . . , xn} with xi ∈ R d (assuming that 1 n ∑n i=1 xi = 0) is Σb = 1 n ∑n i=1 xix ⊤ i . (a) Since Σb is symmetric, it has an orthonormal basis of eigenvectors v1, v2, . . . , vd with λ1 ≤ λ2 ≤ . . . ≤ λd, and it can be expressed as Σb = ∑ d i=1 λivivi . Let Σb (1) be the reduced empirical covariance matrix Σb (1) = 1 n ∑n i=1 (xi − (xi · vd)vd)(xi − (xi · vd)vd) ⊤. Show that Σb (1) vd = 0, while Σb (1) vi = λivi for all i < d. What are the eigenvalues and eigenvectors of Σb (1) then? Use this to show that the second principal component is vd−1. (b) Use induction to show that the k’th principal component of the data is vd−k+1. 1 4. (25 points) The file 3Ddata.txt is a dataset of 500 points in R 3 sampled from a manifold with some added noise. The last number in each line is just an index in {1, 2, 3, 4} related to the position of the point on the manifold to make the visualization prettier (for example, you can plot those points with index 1 in green, index 2 in yellow, 3 in blue and 4 in red). Apply PCA and Isomap to map this data to R 2 . To construct the graph (mesh) for Isomap you can use k = 10 nearest neighbors. Plot the results and comment on the differences. For both methods you need to write your own code (it shouldn’t be more than a few lines each) and submit it together with the write-up. 5. (25 points) The file train35.digits contains 2000 images of 3’s and 5’s from the famous MNIST database of handwritten digits in text format. The size of each image is 28 × 28 pixels. Each row of the file is a representation one image, with the 28 × 28 pixels flattened into a vector of size 784. A value of 1 for a pixel represents black, and value of 0 represents white. The corresponding row of train35.labels is the class label: +1 for the digit 3, or −1 for the digit 5. The file test35.digits contains 200 testing images in the same format as train35.digits. Implement the perceptron algorithm and use it to label each test image in test35.digits. Submit the predicted labels in a file named test35.predictions. In the lectures, the perceptron was presented as an online algorithm. To use the perceptron as a batch algorithm, train it by simply feeding it the training set M times. The value of M can be expected to be less than 10, and should be set by cross validation. Naturally, in this context, the “mistakes” made during training are not really errors. Nonetheless, it is intructive to see how the frequency of mistakes decreases as the hypothesis improves. Include in your write-up a plot of the cumulative number of “mistakes” as a function of the number of examples seen. You may find that it improves performance to normalize each example to unit norm. 2

[SOLVED] Cmsc25400 homework 1

1. (30 points) As we saw in class, k-means clustering minimizes the average square distance distortion Javg2 = ∑ k j=1 ∑ x∈Cj d(x,mj ) 2 , (1) where d(x, x ′ ) = ∥x−x ′∥, and Cj is the set of points belonging to cluster j. Another distortion function that we mentioned is the intra-cluster sum of squared distances, JIC = ∑ k j=1 1 |Cj | ∑ x∈Cj ∑ x′∈Cj d(x, x ′ ) 2 . (a) Given that in k-means, mj = 1 |Cj | ∑ x∈Cj x, show that JIC = 2Javg2 . (b) Let γi ∈ {1, 2, . . . , k} be the cluster that the i’th datapoint is assigned to, and assume that there are n points in total, x1, x2, . . . , xn. Then (1) can be written as Javg2 (γ1, . . . , γn,m1, . . . ,mk) = ∑n i=1 d(xi ,mγi ) 2 . (2) Recall that k-means clustering alternates the following two steps: 1. Update the cluster assignments: γi ← argmin j∈{1,2,…,k} d(xi ,mj ) i = 1, 2, . . . , n. 2. Update the centroids: mj ← 1 |Cj | ∑ i : γi=j xi j = 1, 2, . . . , k. Show that the first of these steps minimizes (2) as a function of γ1, . . . , γn, while holding m1, . . . ,mk constant, while the second step minimizes it as a function of m1, . . . ,mk, while holding γ1, . . . , γn constant. The notation “i : γi =j” should be read as “all i for which γi =j”. (c) Prove that as k-means progresses, the distortion decreases monotonically iteration by iteration. (d) Give an upper bound on the maximum number of iterations required for full convergence of the algorithm, i.e., the point where neither the centroids, nor the cluster assignments change anymore. 1 2. (30 points) Implement the k-means algorithm in a language of your choice (MATLAB, Python, or R are recommended), initializing the cluster centers randomly, as explained in the slides. The algorithm should terminate when the cluster assignments (and hence, the centroids) don’t change anymore. Note: if you use a relatively high level language like MATLAB or Python, your code can use linear algebra primitives, such as matrix/vector multiplication, eigendecomposition, etc. directly. However, you are expected to write you own k–means and k–means++ functions from scratch. Please don’t submit code consisting of a single call to some pre-defined “kmeans” function. (a) The toy dataset toydata.txt contains 500 points in R 2 coming from 3 well separated clusters. Test your code on this data and plot the final result as a 2D plot in which each point’s color or symbol indicates its cluster assignment. Note that because of the random initialization, different runs may produce different results, and in some cases the algorithm might not correctly identify the three clusters. Plot the value of the distortion function as a function of iteration number for 20 sepearate runs of the algorithm on the same plot. Comment on the plot. (b) Now implement the k–means++ algorithm discussed in class and repeat part (a) using its result as intialization (except for the 2D plot). Comment on the convergence behavior of k–means++ vs. the original algorithm. 3. (up to 20 points extra credit) You likely found that on the “toydata” dataset, most of the time even vanilla k–means clustering produces acceptable solutions. Create a dataset of your own for which (on average over many runs) k–means++ improves the performance of k–means by at least a factor of 10 in terms of the distortion function value of the final clustering. Submit the code that you used to generate the dataset. 2

[SOLVED] Cmpt 317 assignment 5 bayesian netowrks, and linear learning models

Overview In this assignment, we’ll exercise a few ideas, but there is no signicant programming. This assignment should take you about 2-4 hours. Questions 1,2 can be done on paper (then scanned and submitted electronically), or using a word processor. Whatever you decide, make it legible, keep it simple, and prefer PDF and JPG le formats. Questions 3 and 4 are very short exercises exploring linear classiers. Most of the work is done for you, but it has to be completed using Jupyter Notebook. This is a way of working with scripts in various languages, combining documentation, code, and output. A version of Jupyter Notebook is available on Linux machines in the lab. Furthermore, the department has a cloud service set up for Jupyter Lab, which you can use: • https://trux.usask.ca The only trick will be getting the les you need on the lab lesystem. Jupyter Notebook is easy to install on your own computers, especially if you use the Anaconda 3 installation tools for Python 3. If you installed Python using Anaconda 3, it may already be installed for you. If not, Google for help. This might take an extra 15 or 20 minutes.Question 1 (12 points): Purpose: To reinforce the concepts of Bayesian networks. Consider the Bayesian network given below. X1 X2 X3 X4 X5 (a) (5 points) What are the Conditional Probability Distributions implied by the network diagram above? List them using the notation P(. . .). You do not need to indicate any probabilities. Just provide the notation. (b) (2 points) Assume that each variable Xi has 10 domain values. How many entries in each Conditional Probability table that you listed? In other words, how many numbers would be required if you were to ll in each table (which you thankfully don’t have to do). What’s the total number of entries, when you add up all the entries for all the CPDs? (c) (1 point) Express the Joint Probability Distribution in terms of the Conditional Probability Distributions you outlined above. (d) (4 points) Derive a formula for the query P(X1|X2, X3, X4). What to Hand In Hand in a document containing the answers to the above questions. You may submit a text document, or you can complete the work on paper, and submit an image of it (scanned, or captured by your phone). Before you submit, make sure it’s legible. • The answers to the questions above. Be sure to include your name, NSID, student number, and course number at the top of all documents. Evaluation • 4 marks: You described the Conditional Probability tables using the P() notation, indicating the correct dependencies as implied in the network structure. • 2 marks: You calculated the correct number of entries for each CPD, and in total. • 1 mark: Your JPD was correctly dened in terms of the CPDs. • 4 marks: Your formula was derived correctly. If the submission is not legible, the marker may deduct up to 100% of the grade.Question 2 (12 points): Purpose: To reinforce the concepts of Bayesian networks. Consider the Bayesian network given below. X1 X2 X3 X4 X5 (a) (5 points) What are the Conditional Probability Distributions implied by the network diagram above? List them using the notation P(. . .). You do not need to indicate any probabilities. Just provide the notation. (b) (2 points) Assume that each variable Xi has 10 domain values. How many entries in each Conditional Probability table that you listed? In other words, how many numbers would be required if you were to ll in each table (which you thankfully don’t have to do). What’s the total number of entries, when you add up all the entries for all the CPDs? (c) (1 point) Express the Joint Probability Distribution in terms of the Conditional Probability Distributions you outlined above. (d) (4 points) Derive a formula for the query P(X1|X2, X3, X4). What to Hand In Hand in a document containing the answers to the above questions. You may submit a text document, or you can complete the work on paper, and submit an image of it (scanned, or captured by your phone). Before you submit, make sure it’s legible. • The answers to the questions above. Be sure to include your name, NSID, student number, and course number at the top of all documents. Evaluation • 4 marks: You described the Conditional Probability tables using the P() notation, indicating the correct dependencies as implied in the network structure. • 2 marks: You calculated the correct number of entries for each CPD, and in total. • 1 mark: Your JPD was correctly dened in terms of the CPDs. • 4 marks: Your formula was derived correctly. If the submission is not legible, the marker may deduct up to 100% of the grade. Page 3 Question 3 (5 points): Purpose: To work experiment with a simple implementation of a Linear Classier. On the Moodle Assignment page, you’ll nd the les linclass.py and A5Q3.ipynb. Download them both. Using Jupyter Notebook, open A5Q3.ipynb. Read through the document. Complete the TO DO items in the notebook: (a) (2 points) TO DO: Choose a dierent line (b) (1 point) TO DO: Add a point to the line (c) (2 points) TO DO: Increase the number of learning steps What to Hand In The Jupyter Notebook A5Q3.ipynb with the TO DO items complete. Be sure to include your name, NSID, student number, and course number at the top of all documents. Evaluation 1. (2 marks) Your new line separates the original data. 2. (1 mark) The point you added was misclassied by the line from the previous task. 3. (2 marks) You increase the number of learning steps, and the tted line separates the two classes. Page 4 Question 4 (4 points): Purpose: To work experiment with a simple implementation of a Linear Classier. On the Moodle Assignment page, you’ll nd the les linclass.py and A5Q4.ipynb. Download them both. Using Jupyter Notebook, open A5Q4.ipynb. Read through the document. Complete the TO DO items in the notebook: (a) (2 points) TO DO: Increase the number of learning steps (linear classier) (b) (2 points) TO DO: Increase the number of learning steps (logistic classier) What to Hand In The Jupyter Notebook A5Q4.ipynb with the TO DO items complete. Be sure to include your name, NSID, student number, and course number at the top of all documents. Evaluation (a) (2 marks) You increased the number of learning steps, and changed the alpha parameter. Your linear classifying line does not perfectly classify the data. (b) (2 marks) You increased the number of learning steps, and changed the alpha parameter. The Logistic Classier gets all but one data point correctly classied. Page 5