In this section, we will walk through Natural Policy Gradient (NPG) and also implement it for CartPole Simulation. 1.1 Background and Setting We will consider the default CartPole simulator in OpenAI gym where we have two discrete actions A = {0, 1} (here 0 and 1 are the index of actions, and physically, 0 means applying a left push to the cart, and 1 means applying a right push to the cart. You just need to compute a stochastic policy which samples 0 or 1, and feed it to the step function which will do the rest of the job for you). Before defining the policy parameterization, let us first featurize our state-action pair. Specifically, we will use random fourier feature (RFF) ϕ(s, a) ∈ R d , where d is the dimension of RFF feature. RFF is a randomized algorithm that takes the concatenation of (s, a) as input, outputs a vector ϕ(s, a) ∈ R d , such that it approximates the RBF kernel, i.e., for any (s, a),(s ′ , a′ ) pair, we have: lim d→∞ ⟨ϕ(s, a), ϕ(s ′ , a′ )⟩ = k ([s, a], [s ′ , a′ ]), where k is the RBF kernel on the concatenation of state-action (we denote [s, a] as the vector [s ⊤, a] ⊤).In summary, RFF feature approximates RBF kernel, but allows us to operate in the primal space rather than the dual space where we need to compute and invert the Gram matrix (recall Kernel trick and Kernel methods from the ML introduction course) which is very computationally expensive and does not scale well to large datasets.We parameterize our policy as follows: πθ(a|s) = exp θ ⊤ϕ(s, a) P a′ exp (θ⊤ϕ(s, a′)), where the parameters θ ∈ R d . Our goal is of course to find the best θ such that the resulting πθ achieves large expected total reward.1See https://pytorch.org/get-started/locally/ for installation for local machine. Recommended to have torch 1.11.0.1.2 Gradient of Policy’s Log-likelhood TODO Given (s, a), we need to first derive the expression ∇θ ln πθ(a|s). This is part of the previous written homework. Now go to utils.py to implement the computation of ∇θ ln πθ(a|s)in compute log softmax grad.You can also implement compute softmax and compute action distribution first to use them to calculate the gradient. You should pass the tests in test.py for these functions before moving on. Remark The file test.py comprises of tests for some functions. If your implementation is correct, the printed errors of all tests should be quite small, usually not larger than 1e-6.1.3 Fisher Information Matrix and Policy Gradient We consider the finite-horizon MDP. Note that in class we considered the infinite-horizon version, but they are quite similar. Let us now compute the fisher information matrix. Let us consider a policy πθ. Recall the definition of Fisher information matrix: Fθ = Es,a∼d πθ µ0 h ∇θ ln πθ(a|s) (∇θ ln πθ(a|s))⊤ i ∈ R d×d .We approximate Fθ using trajectories sampled from πθ. We first sample N trajectories τ 1 , . . . , τ N from πθ, where τ i = {s i h , ai h , ri h } H−1 h=0 with s i 0 ∼ µ0. We approximate Fθ using all (s, a) pairs: Fbθ = 1 N X N i=1 ” 1 H H X−1 h=0 ∇θ ln πθ(a i h |s i h )∇θ ln πθ(a i h |s i h ) ⊤ # + λI, where λ ∈ R + is the regularization for forcing positive definiteness.Remark Note the way we estimate the fisher information. Instead of doing the roll-in procedure we discussed in the class to get s, a ∼ d πθ µ0 (this is the correct way to ensure the samples are i.i.d), we simply sample N trajectories, and then average over all state-action (sh, ah) pairs from all N trajectories. This way, we lose the i.i.d property (these state-action pairs are dependent), but we gain sample efficiency by using all data.TODO First, go to train.py and finish the sample function to sample trajectories using the current policy. This function should rollout N trajectories and keep track of the gradients and rewards collected during each trajectory. Then go to file utils.py and implement Fbθ in compute fisher matrix. Note that in OpenAI Gym CartPole, there is a termination criteria when the pole or the cart is too far away from the goal position (i.e., during execution, if the termination criteria is met or it hits the last time step H, the simulator will return done = True in step function.During generating a trajectory, it is possible that we will just terminate the trajectory early since we might meet the termination criteria before getting to the last time step H). Hence, we will see that when we collect trajectories, each trajectory might have different lengths.Thus, for estimating Fθ, we need to properly average over the trajectory length. You should pass the tests for compute fisher matrix. The test solutions were computed using the default lambda value of 1e-3. We don’t have tests for sample so make sure that is correct before moving on.TODO Denote V θ as the objective function V θ = E hPH−1 h=0 r(sh, ah)|s0 ∼ µ0, ah ∼ πθ(·|sh) i .Again be mindful that each trajectory might have different lengths due to early termination. Let us implement the policy gradient, i.e., ∇bV θ = 1 N X N i=1 ” 1 H H X−1 h=0 ∇θ ln πθ(a i h |s i h ) ” ( H X−1 t=h r i t ) − b ## , where b is a constant baseline b = PN i=1 R(τ i )/N, i.e., the average total reward over a trajectory. Go to utils.py to implement this PG estimator in compute value gradient. There are tests in test.py for this function.1.4 Implement the step size With Fbθ and ∇b θV θ , recall that NPG has the following form: θ ′ := θ + ηFb−1 θ ∇bV θ . 2We need to specify the step size η here. Recall the trust region interpretation of NPG. We perform incremental update such that the KL divergence between the trajectory distributions of the two successive policies are not that big. Recall that the KL divergence KL(ρ πθ |ρ πθ′ ) can be approximate by Fisher information matrix as follows (ignoring constant factors): KL(ρ πθ |ρ πθ′ ) ≈ (θ − θ ′ ) ⊤ Fθ(θ − θ ′ ).As we explained in the lecture, instead of setting learning rate as the hyper-parameter, we set the trust region (which has a more transparent interpretation) size as a hyper parameter, i.e., we set δ such that: KL(ρ πθ |ρ πθ′ ) ≈ (θ − θ ′ ) ⊤ Fbθ(θ − θ ′ ) ≤ δ. Since θ ′ − θ = ηFb−1 θ ∇bV θ , we have: η 2 (∇bV θ ) ⊤Fb−1 θ ∇bV θ ≤ δ. Solving for η we get η ≤ q δ (∇bV θ)⊤Fb−1 θ ∇bV θ . We will just set η = q δ (∇bV θ)⊤Fb−1 θ ∇bV θ , i.e., be aggressive on setting learning rate while subject to the trust region constraint.To ensure numerical stability when the denominator is close to zero, we add ϵ = 1e − 6 to the denominator so the expression we will use is η = s δ (∇bV θ)⊤Fb−1 θ ∇bV θ + ϵ (1) TODO Now go to utils.py and implement these step size computation in compute eta. Check your implementation with the tests in test.py.1.5 Putting Everything Together Now we can start putting all pieces together. Go to train.py to implement the main framework in train. In each iteration of NPG, we do the following 4 steps: • Collect samples by rolling out N trajectories with the current policy using the sample function. • Compute the fisher matrix using the gradients from the previous steps. Use the default value of λ. • Compute the step size for this NPG step. • Update the model paramaters, θ, by taking an NPG step.In addition to the above 4 steps, keep track of the average episode reward in each iteration of the algorithm. The output of train should be the final model parameters and a list containing the average episode rewards for each step of NPG. For this section, run the above algorithm with the parameters T = 20, δ = 0.01, λ = 0.001, and N = 100. Plot the performance curve of each πθt for t = 0 to 19. You should be able to do this by simply running train.py from the terminal. The training output θ will be saved in folder learned policy/NPG/ and will be needed in the subsequent DAgger task.Hint Your algorithm should achieve average reward of over 190 in about 15 steps if implemented correctly.Note DAgger has not yet been covered in class so far, but it will be covered in a future lecture. In this section, we will implement one classical imitation learning methods – DAgger – and get you started on implementing it in PyTorch. We will use the NPG policy that you trained in the previous sections as the expert policy π ∗ . We will be focusing on the CartPole environment as well.2.1 Background and Setting The goal in imitation learning is, given a set of N expert datapoints D∗ = {s ∗ i , a∗ i } N i=1 from some unknown, black-box expert policy π ∗ , we want to learn a policy π that is as good as the expert.DAgger is a method of interactive imitation learning that is similar to behavioral cloning, but we can query the expert on state-action pairs seen when rolling out our policy. In this way, when rolling out policy, we can essentially get an expert rollout R∗ = {si , π∗ (si)} H i=1, and we can “aggregate” this rollout into our dataset and learn from all our experience collected so far.Partial observation of the learning agent. To demonstrate that imitation learning is powerful, we will constrain the learner’s observation to a subset of the expert’s, yet the learner is still capable of attaining comparable performance. Specifically, recall that the state of CartPole comprises four components: Cart Position, Cart Velocity, Pole Angle, and Pole Angular Velocity. Our goal is to demonstrate that, although the expert observes the full state, the policy learned by DAgger works well even when the learning agent is only able to observe a subset of the state.2.2 Implementation You will need to implement functions in dagger.py, which contains functions defining our DAgger learning agents and the training script. Specifically, please implement the following functions: • DAgger/sample from logits: This method should take in action logits (of size (B, Da)), and sample an action according to the distribution defined by said logits. Specifically, you will need to sample from the distribution Pr(a) = exp (ιa) /( P a′ exp (ιa′ )), where ι denotes the logits.• DAgger/rollout: This method takes the environment and the number of steps to roll out in the environment as inputs, and should return a set of data points (s, π∗ (s)) by rolling out in the environment. Specifically, this can be done in two steps: (1) roll out by taking actions according to the current policy (see self.policy definition in the class), and (2) get the expert’s action for each state (see self.expert policy definition in the class).• DAgger/learn: This method should take in a batch of state and actions and perform a step of gradient descent over the batch. Specifically, you need to compute the action logits for the given states using the current policy, compute the cross entropy loss (see self.loss) using the logits and the expert actions, and perform one step of gradient descent.• experiment: For a number of epochs, collect data while also training the agent on the new data. Specifically, first, we have to add our newly rolled out data into the dataset (see dataset.py for more information), and then create the dataloader to process our data. The dataloader has to be created every time before learning can begin due to the changing size of our dataset. Then, we perform supervised learning, i.e., we get batches from dataloader and perform gradient descent using learner.learn.2.3 Testing You can train the agent by running python dagger.py –dagger epochs 20 –num rollout steps 200 –dagger supervision steps 20 –state to remove x where x is an integer that could be 0,1,2, or 3, corresponding to which state component you want to remove (as we said earlier, we want to constrain the learner’s observation). Once done training, please test your agent via running test dagger.py. You may also need to specify the argument state to remove for testing.The parameter state to remove should be consistent between the training and testing. In other words, we should always remove the same state component during both the training and testing. Other arguments for testing are the same ones specified by the get args function in utils.py.Please try to remove each of the four state components, and then report the average reward in the writeup. In other words, please run the training and testing for four times with –state to remove set to 0,1,2, and 3, respectively, and record the results. Please also save the four trained models with the names x-CartPole-v0.pt (where x=0,1,2, or 3 representing the removed state component) under the folder learned policies/dagger/.Please state in the writeup what you have discovered from these four trials. Which state component can be removed without compromising the performance, and which one impacts the performance the most when removed? Hint You may want to verify your DAgger implementation by running the following to train a DAgger agent with full state information: python dagger.py –dagger epochs 20 –num rollout steps 200 –dagger supervision steps 20 If your implementation is good enough, the agent will achieve an average reward of at least 190.Section 3: Submission YOUR NET ID/ learned policies/ dagger 0-CartPole-v0.pt 1-CartPole-v0.pt 2-CartPole-v0.pt 3-CartPole-v0.pt NPG expert theta.npy answers.pdf common.py dagger.py dataset.py README.md requirements.txt test dagger.py test info.pkl test.py train.py utils.py where answers.pdf should contain your plot for Section 1.5 and your discussion for Section 2.3.
For this assignment, we will ask you to implement a Deep Q Network (DQN) to solve a cartpole environment (where you have to balance a pole on a cart) and a lunar lander environment (where you try to land a spaceship between two flags on the moon). We will be using CartPole-v0 and LunarLander-v2 for this assignment. In case you need a refresher on how Gym environments work in general, we invite you to check out this tutorial by Wen-Ding, a graduate TA of CS 4789 in Spring 2021.Note: It is advised that students iterate or write code on the colab first. If a session is ended on google colab, progress is lost. You should write code locally (i.e. debug there) and then run the training/testing on google colab. This assignment is worth 60 points in total.Figure 1: The CartPole environment. Figure 2: The LunarLander environment. There are several files in this project. • network.py: This contains code to implement the Q network agent. This Q network is our “policy” that we use to take actions in our environment. • utils.py: This file is for general utilities needed for our experiments. • buffer.py: This contains code implementing the replay buffer used throughout training. This replay buffer will store the transitions that we collect online. • train.py: This contains the standard training loop. We train and save agents by running this file. • test.py: Test file. Run this with python test.py. These file contains a test method where you can see how your trained agent does in the environment. You can change the arguments when you run the test file in order to test in different environments.There are 4 files that you will modify in this programming assignment: • network.py: You will modify the get max q, get action, and get targets methods. • utils.py: You will modify the update target method. • buffer.py: You will implement the replay buffer used in training. • train.py: You will modify the training script to train your DQN. The TODOs specify the method expectations.In class, we have touched on Q-learning, an off-policy reinforcement learning algorithm that aims to find the optimal Q value function Q⋆ by minimizing the difference between the Q values Q(s, a) and the onestep Bellman optimal Q values r(s, a) + γEs ′∼P (s,a) [maxa′ Q(s ′ , a′ )]. This is because the optimal Q function Q⋆ satisfies Q ⋆ (s, a) = r(s, a) + γEs ′∼P (s,a) h max a′ Q ⋆ (s ′ , a′ ) i , for all (s, a), and so minimizing the difference will hopefully bring us closer to finding the optimal Q function. At a terminal state, by the Bellman equations in the finite horizon setting, the value function, and thus the Q function, at all states s and actions a are equal to 0. The terminal states in CartPole are if we keep the pole balancing for 200 timesteps and when the angle between the pole and the cart gets below some angular threshold. In LunarLander, the terminal states are when the lander flies off of the screen, crashes to the moon’s surface or safely lands.In particular, tabular Q-learning keeps a table/matrix of Q values TQ ∈ R |S|×|A|, and at every iteration of training, given a transition (s, a, r, s′ ), our Q update is as follows: TQ(s, a) ← TQ(s, a) + α r + γ max a′ TQ(s ′ , a′ ) − TQ(s, a) .Here, α is a learning rate parameter that regulates how much we change our Q value at each iteration. The main bottleneck in tabular Q-learning algorithms is that when the number of states |S| and the number of actions |A| are very large, our table ends up being gigantic, and in complex environments even infeasible to keep in memory.This is particularly the case in Atari games, where the number of states (and the state size) were so large that any tabular Q-learning method would take an insanely long time (longer than the average human life expectancy, assuming we have infinite memory) to converge and spit out an optimal Q function.How do we fix this issue? Researchers at DeepMind noted that we can do away with the Q table entirely and replace it with a deep neural network. In particular, this neural network specifies a function f : R |S| → R |A| that takes in a state s and returns the Q values of all the actions a ∈ |A|. This led to the introduction of the deep Q network (DQN), which showed amazing results on the Atari game suite. In this assignment, we will be working in simpler environments with discrete action spaces, so A is finite in size.3.1 DQN update scheme We can treat the Q-learning scheme as an optimization problem, where, once again, we aim to minimize the difference between the Q function at a given timestep and the one-step Bellman optimal values at the next step. In particular, if we specify our Q function by a set of parameters θ (given by our neural network), we can write our one step Bellman optimal targets as y(s, a) = r(s, a) + γEs ′∼P (s,a) h max a′ Qθ(s ′ , a′ ) i , and write our optimization problem as θ ⋆ = arg min θ Es,a (Qθ(s, a) − y(s, a))2 .We can do this optimization via gradient descent, where we treat our optimization problem as a loss function with respect to θ and optimize. In particular, if we let L(θ) = Es,a (Qθ(s, a) − y(s, a))2 be our loss function, we can write our gradient descent update at timestep t as θt+1 = θt − α∇θL(θt).3.2 Ingredients for update scheme There are a lot of terms that we need to compute in order to do our optimization. The major term that we need to compute are our one-step Bellman optimal targets r(s, a) + γEs ′∼P (s,a) [maxa′ Qθ(s ′ , a′ )]. TODO: Modify the get max q function in network.py in order to return the max Q value given next states s ′ . This method should take in a PyTorch tensor of shape (B, Ds) as input, where B is the batch size and Ds is the state dimension, and return a PyTorch tensor of size B, where each entry in the output is the maximum Q value of each state in the batch.Once we compute our max Q values, we can compute our targets. TODO: Modify the get targets function in network.py, which, given rewards r, next states s ′ and terminal signals d, computes the target for our Q function. See the forward method for additional info. Note that if s ′ is a terminal state (given by the done flag), the Q function should return 0 by definition. All inputs are PyTorch tensors, and the rewards are of shape B, the next states are of shape (B, Ds), and the terminal signals are of shape B, where B is the batch size. Your function should return a tensor of size B.Computing the action at every timestep So far, we have assumed that we have (s, a, r, s′ , d) tuples ready to do our update. However, given that reinforcement learning is fundamentally interactive, we have to actually get these transitions by rolling out our policy while training.One of the main challenges in reinforcement learning is that of exploration vs. exploitation, where at every timestep the agent chooses whether to “explore” the environment by taking a random action or “exploit” it by taking the action that is optimal with respect to its current Q function. The way that most people do this is via an ϵ-greedy policy, where with probability ϵ we take a random action (explore), otherwise we take the current optimal action (exploit). Naturally, during training, we decrease ϵ to lean more and more on our learned Q function.TODO: Modify the get action method, which, given a state s and exploration parameter ϵ, returns an action a that the policy prescribes. Return the action as a NumPy array or an integer.3.3 Storing our transitions During training, we want to store our experience in a replay buffer and sample experience every time we need to do a Q update. In particular, we want to store transitions of (s, a, r, s′ , d) pairs in a replay buffer. TODO: Modify the methods in buffer.py to store and sample transitions. Keep in mind that we store and sample PyTorch tensors from the buffer. Check out what we imported at the top of the file in order to best do these parts.3.4 Stability of training Deep Q networks, while comparable to neural networks used in supervised learning, have a slight hitch that makes them more difficult to train: the targets they are predicting are based off of the current Q function estimates. This makes training harder (think about it as a dog trying to chase its own tail). In this vein, to get closer to supervised learning, we initialize a target network Qθ¯ with parameters ¯θ, which gets updated every few episodes of training as follows via a soft update: ¯θ ← (1 − τ ) ¯θ + τθ, where τ is a temperature hyperparameter that controls how close ¯θ gets to our online Q network parameters θ.TODO: Modify the update target method in utils.py to update the target network. The method will take in the target network (of type torch.nn.Module), the current network (also of type torch.nn.Module), and the temperature parameter τ (a float).3.5 Training We are finally ready to move on to training our Q network. TODO: Fill out the TODO sections in train.py. To learn about optimizers and training a neural network in PyTorch, see the tutorials at this link.When you are ready to test, your numbers will likely have some variation. However, the average episode length and reward should both be 200.0 for the CartPole environment. For the LunarLander environment, the expected episode length of a working policy is around 400.0, and the reward incurred should be around 50.0. We encourage you to see if you can achieve even better rewards.3.6 Report Once finishing the training loop, run the following commands: • python train.py –target –hidden sizes 128 for CartPole-v0 • !python train.py –env LunarLander-v2 –target –hidden sizes 64 64 –num episodes 1500 –tau 1e-2 –batch size 64 –gamma 0.99 –eps 0.995 –learning freq 4 –target update freq 4 –max size 100000 for LunarLander-v2For LunarLander-v2, this could take a long time (on a CPU, it can take around 10 minutes). If needed, definitely port your code to Google Colab. Report Deliverable 1. Report your mean rewards and mean episode lengths for both environments, using the given choices of hyperparameters. This can be done by running python test.py with the correct arguments in the main method. See the commented code in the main method to see how to best run your agents in the correct environments.Report Deliverable 2. Please also report your methods and findings in the following areas (if you have limited compute power, definitely prioritize working with CartPole to answer these questions, but for full credit please provide answers with both environments): • How important is training with the target network? Can you get similar performance by just removing the target network? If you get similar (or even better) performance, what do you think is the reason why? Note that in order to disable training with the target network, you can remove the –target argument from the training command.• Does updating the target network more than once every episode (i.e. moving the call to update targets inside the episode for loop) improve performance? • Can you find a hyperparameter setting whose performance exceeds the setting we provided in the above training commands?3.7 Submission Please submit a zip file containing the following files (make sure that they are at the top level of your submission, i.e. not contained within any inner folders): submission.zip Answers.pdf network.py utils.py buffer.py train.py test.py trained agent cartpole-v0.pt trained agent lunarlander-v2.pt where Answers.pdf contains the Report Deliverables from section 3.6, and trained agent cartpole-v0.pt and trained agent lunarlander-v2.pt are the models obtained from training with the default hyperparameters.
For this assignment, we will ask you to implement LQR (and DDP) in order to solve a cartpole environment. While we are using a custom cartpole environment, it would be useful to go through the following OpenAI Gym introduction.It may also be useful to look up how gym environments work in general. Wen-Ding, a graduate TA of CS 4789 from SP21, created a tutorial that may be useful. OpenAI Gym contains many environments that can be used for testing RL algorithms. Additionally, it allows users to create their own custom environments. All these environments share a standard API, making it easy to test different algorithms on different environments. As mentioned earlier, we are using a custom version1 of the cartpole environment. The goal here is to keep the pole on the cartpole upright by applying a variable force either left or right on the cartpole as shown below.There are several files in this project. • lqr.py: This contains the code to find the optimal control policy for LQR given the dynamics and cost matrices.• finite difference method.py: This file is used to finite difference approximations. These will be used to help compute the reward and transition matrices from the environment which we then pass to the LQR. • test.py: Test case files. Run this with python test.py. These will contains tests for different parts of the code.• ddp.py: This contains code which implements DDP. • local controller.py: This contains cartpole environment specific elements and compute local expansion. • cartpole.py This contains the standard loop for interacting with the environment. One can use –env DDP to run DDP or –env LQR to run local LQR. You can use the -h flag for more information about usage.In the standard cartpole environments, the only possible actions are to apply a force left or right. The amount of force applied is constant. In our environment, we allow variable force to be applied.There are 3 parts to this programming assignment. • Section 3 goes through the derivation of the generalized LQR formulation. We break the proof down into intermediate steps and show the final results. You should try the proofs yourself and verify the results yourself. You are responsible for understanding the proof for exams. There is no coding in this section.• Section 4 describes a local linearization approach to stabilizing the cartpole. Section 4.3’s task is to implement local linearization control for the cartpole to stay vertically upwards. You will need to complete the functions in – finite difference method.py – local controller.py – LQR.py• Section 5 goes through the derivation of DDP. After understanding the procedure, you are asked to complete functions in DDP.py. Once the programming tasks are done, you will need to test and visualize the cartpole. You will turn in a report containing the test output and images. Complete instructions can be found at the end of Section 6.In the class, we introduced the most basic formulation of LQR. In this section, we are going to slightly make the model more general. We are interested in solving the following problem min π0,··· ,πT−1 E ” T X−1 t=0 s ⊤ t Qst + a ⊤ t Rat + s ⊤ t M at + q ⊤st + r ⊤at + b # (1) subject to st+1 = Ast + Bat + m, at = πh(st) s0 ∼ µ0 (2) Here we have s ∈ R ns , a ∈ R na , Q ∈ R ns×ns (Q is positive definite), M ∈ R ns×na , q ∈ R ns , R ∈ R na×na (R is positive definite), r ∈ R na , b ∈ R, A ∈ R ns×ns , B ∈ R ns×na , and m ∈ R ns . We also always have the following matrix being positive definite: Q M/2 M⊤/2 R (3)The difference between the above formulation and the formulation we had in class is that the cost function contains an additional second order term s ⊤ t M at, first-order terms q ⊤st, r⊤at and zeroth order terms b, and the dynamics contain zeroth order term m. The transitions here are deterministic. In this problem, we will derive the expressions for Q⋆ t , π⋆ t , and V ⋆ t for the base and inductive cases like in class. The difference this time is V ⋆ t and π ⋆ t can be thought of as complete quadratic and complete linear functions as follows V ⋆ t (s) = s ⊤Pts + y ⊤ t s + pt π ⋆ t (x) = K⋆ t s + k ⋆ t where Pt ∈ R ns×ns (Pt is PSD), yt ∈ R ns , pt ∈ R and K⋆ t ∈ R na×ns , k⋆ t ∈ R na .The following sections outline the derivation but do not go through all the steps. These are left as an exercise to students. 3.1 Base Case 3.1.1 Q⋆ T −1 Q ⋆ T −1 (s, a) = s ⊤Qs + a ⊤Ra + s ⊤M a + q ⊤s + r ⊤a + b 3.1.2 Extracting the policy We can derive the expression from π ⋆ T −1 as done in class. Then, we can write out the expression for K⋆ T −1 , k⋆ T −1 . If you’ve done things correctly, you should get the following π ⋆ T −1 (s) = − 1 2 R −1M⊤s − 1 2 R −1 r ∴ K⋆ T −1 = − 1 2 R −1M⊤ ∴ k ⋆ T −1 = − 1 2 R −1 r 3.1.3 V ⋆ T −1 Recall from class that V ⋆ t (s) = Q⋆ t (s, π⋆ t (s)). Using Q⋆ T −1 and π ⋆ T −1 , we can derive the expression for V ⋆ T −1 (s). We’ll keep K⋆ T −1 , k⋆ T −1 in the expression and then PT −1, yT −1, pT −1. If you’ve done everything correctly, you should get the following PT −1 = Q + K⋆ T −1 ⊤RK⋆ T −1 + MK⋆ T −1 y ⊤ T −1 = q ⊤ + 2(k ⋆ T −1 ) ⊤RK⋆ T −1 + (k ⋆ T −1 ) ⊤M⊤ + r ⊤K⋆ T −1 pT −1 = (k ⋆ T −1 ) ⊤Rk⋆ T −1 + r ⊤k ⋆ T −1 + b V ⋆ T −1 (s) = s ⊤PT −1s + y ⊤ T −1 s + pT −1These expressions can be simplified as shown below but the above expressions will be useful when deriving the inductive case. PT −1 = Q − 1 4 MR−1M⊤ y ⊤ T −1 = q ⊤ − 1 2 r ⊤R −1M⊤ pT −1 = b − 1 4 r ⊤R −1 r Note above, we technically need to show that PT −1 is PSD before moving on. You can try to show this.3.2 Inductive step For the inductive step, assume V ⋆ t+1(s) = s ⊤Pt+1s + y ⊤ h+1s + ph+1 where Pt+1 is PSD. 3.2.1 Q⋆ t We can derive the expression for Q⋆ t (s, a) following the steps done in class. If done correctly, you should get the following Q ⋆ t (s, a) = s ⊤Cs + a ⊤Da + s ⊤Ea + f ⊤s + g ⊤a + h where C = Q + A ⊤Pt+1A D = R + B ⊤Pt+1B E = M + 2A ⊤Pt+1B f ⊤ = q ⊤ + 2m⊤Pt+1A + y ⊤ t+1A g ⊤ = r ⊤ + 2m⊤Pt+1B + y ⊤ t+1B h = b + m⊤Pt+1m + y ⊤ t+1m + pt+1 You should notice that Q⋆ t (s, a) is similar to the expression for Q⋆ T −1 (s, a). Although we are not asking you to prove this, you should verify for yourself that C and D are positive definite matrices. Thus, we can use the exact same steps as in 3.1.2 and 3.1.3 to find π ⋆ t and V ⋆ t . 3.2.2 Extracting the policy Following the same steps in 3.1.2, we get that π ⋆ t (s) = − 1 2 D−1E ⊤s − 1 2 D−1 g K⋆ t = − 1 2 D−1E ⊤ k ⋆ t = − 1 2 D−1 g Please verify these for yourself. 3.2.3 V ⋆ t Using the same steps as in 3.1.3, we can derive V ⋆ t (s). Since we know Q ⋆ t (s, a) = s ⊤Cs + a ⊤Da + s ⊤Ea + f ⊤s + g ⊤a + h then we have Pt = C + K⋆ t ⊤DK⋆ t + EK⋆ t y ⊤ t = f ⊤ + 2(k ⋆ t ) ⊤DK⋆ t + (k ⋆ t ) ⊤E ⊤ + g ⊤K⋆ t pt = (k ⋆ t ) ⊤Dk⋆ t + g ⊤k ⋆ t + h V ⋆ t = x ⊤Ptx + y ⊤ t x + pt 4 Section 4: Programming: Local Linearization Approach for Controlling CartPole4.1 Setup of Simulated CartPole The simulated CartPole has the following nonlinear deterministic dynamics st+1 = f(st, at), and potentially non-quadratic cost function c(s, a) that penalizes the deviation of the state from the balance point (s ⋆ , a⋆ ) where a ⋆ = 0, and s ⋆ represents the state of CartPole where the pole is straight and the cart is in a pre-specified position. A state st is a 4-dimension vector defined as st = xt vt θt ωt It consists of the position of the cart, the speed of the cart, the angle of the pole in radians, and the angular velocity of the pole. The action at is a 1-dimensional vector correspond to the force applied on the cart. Through this section, we assume that we have black-box access to c and f, i.e., we can feed any (s, a) to f and c, we will get two scalar returns which are f(s, a) and c(s, a) respectively. Namely, we do not know the analytical math formulation of f and c (e.g., imagine that we are trying to control some complicated simulated humanoid robot. The simulator is the black-box f).In this assignment we will use our customized OpenAI gym CartPole environment provided in the following repository. The environment is under env directory. The goal is to finish the implementation of lqr.py which contains a class to compute the locally linearized optimal policy of our customized CartPole environment. We also provide other files to help you get started.4.1.1 TODO: Please review the files for the programming assignment. 4.2 Finite Difference for Taylor Expansion Since we do not know the analytical form of f and c, we cannot directly compute the analytical formulations for Jacobians, Hessians, and gradients. However, given the black-box access to f and c, we can use finite difference to approximately compute these quantities.Below we first explain the finite differencing approach for approximately computing derivatives. Your will later use finite differencing methods to compute A, B, Q, R, M, q, r. To illustrate finite differencing, assume that we are given a function g : R 7→ R. Given any α0 ∈ R, to compute g ′ (α0), we can perform the following process: Finite Difference for derivative: gb′(α0) := g(α0 + δ) − g(α0 − δ) 2δ , for some δ ∈ R +.Note that by the definition of derivative, we know that when δ → 0 +, the approximation approaches to g ′ (α0). In practice, δ is a tuning parameter: we do not want to set it to 0 + due to potential numerical issue. We also do not want to set it too large, as it will give a bad approximation of g ′ (α0). With gb′(α) as a black-box function, we can compute the second-derivate using Finite differencing on top of it:Finite Difference for Second Deriviative: gc′′(α0) := gb′(α0 + δ) − gb′(α0 − δ) 2δ Note that to implement the above second derivative approximator gc′′(α), we need to first implement the function gb′(α) and treat it as a black-box function inside the implementation of gc′′(α). You can see that we need to query black-box f twice for computing g ′ (α0) and we need to query black-box f four times for computing g ′′(α0).Similar ideas can be used to approximate gradients, Jacobians, and Hessians. At this end, we can use the provided Cartpole simulator which has black-box access to f and c and the goal balance point s ⋆ , a⋆ , to compute the taylor expantion around the balancing point.4.2.1 TODO: We provide minimum barebone functions and some tests for finite difference method in the file finite difference method.py. Complete the implementation of gradient, Jacobian and Hessian functions so that we can use them in the next section.4.3 Local Linearization Approach for Nonlinear Control Formally, consider our objective: min π0,…,πT−1 T X−1 t=0 c(st, at) (4) subject to: st+1 = f(st, at), at = πt(st), s0 ∼ µ0; (5) where f : R ns × R na 7→ R ns . In general the cost c(s, a) could be anything. Here we focus on a special instance where we try to keep the system stable around some stable point (x ⋆ , a⋆ ), i.e., our cost function c(s, a) penalizes the deviation to (s ⋆ , a⋆ ), i.e., c(s, a) = ρ (st − s ⋆ )+ρ(at−a ⋆ ), where ρ could be some distance metric such as ℓ1 or ℓ2 distance.To deal with nonlinear f and non-quadratic c, we use the linearization approach here. Since the goal is to control the robot to stay at the pre-specified stable point (s ⋆ , a⋆ ), it is reasonable to assume that the system is approximately linear around (s ⋆ , a⋆ ), and the cost is approximately quadratic around (s ⋆ , a⋆ ). Namely, we perform first-order taylor expansion of f around (s ⋆ , a⋆ ), and we perform second-order taylor expansion of c around (s ⋆ , a⋆ ): f(s, a) ≈ A(s − s ⋆ ) + B(a − a ⋆ ) + f(s ⋆ , a⋆ ), c(s, a) ≈ 1 2 s − s ⋆ a − a ⋆ ⊤ Q M M⊤ R s − s ⋆ a − a ⋆ + q r ⊤ s − s ⋆ a − a ⋆ + c(s ⋆ , a⋆ ); Here A and B are Jacobians, i.e., A ∈ R ns×ns : A[i, j] = ∂f[i] ∂s[j] [s ⋆ , a⋆ ] : B ∈ R ns×na , B[i, j] = ∂f[i] ∂a[j] [s ⋆ , a⋆ ], where f[i](s, a) stands for the i-th entry of f(s, a), and s[i] stands for the i-th entry of the vector s. Similarly, for cost function, we will have Hessians and gradients as follows: Q ∈ R ns×ns : Q[i, j] = ∂ 2 c ∂s[i]∂s[j] [s ⋆ , a⋆ ], R ∈ R na×na : R[i, j] = ∂ 2 c ∂a[i]∂a[j] [s ⋆ , a⋆ ] M ∈ R ns×na : M[i, j] = ∂ 2 c ∂s[i]∂a[j] [s ⋆ , a⋆ ], q ∈ R ns : q[i] = ∂c ∂s[i] [s ⋆ , a⋆ ] r ∈ R na : r[i] = ∂c ∂a[i] [s ⋆ , a⋆ ]. We are almost ready compute a control policy using A, B, Q, R, M, q, r together with the optimal control we derived for the system in Eq. 1. One potential issue here is that the original cost function c(s, a) may not be even convex. Thus the Hessian matrix H := Q M M⊤ R may not be a positive definite matrix. We will apply further approximation to make it a positive definite matrix. Denote the eigen-decomposition of H as H = Pns+na i=1 σiviv ⊤ i where σi are eigenvalues and vi are corresponding eigenvectors. We force H to be convex as follows: H ← nXs+na i=1 1{σi > 0}σiviv ⊤ i + λI, (6) 6 where λ ∈ R + is some small positive real number for regularization which ensures that after approximation, we get an H that is positive definite with minimum eigenvalue lower bounded by λ.Note this Hessian matrix is slightly different from 3. You should not view these matrices as the same in Eq. 1 yet. We still need to reformulate this problem in that form as shown in section 4.4 4.3.1 TODOs: Review the concepts of gradient, Jacobian, Hessian, and positive definite matrices. With the previous implementation of finite difference methods, you can complete the function compute local expansion in local controller.py to calculate the A, B, Q, R, M, q, r as we defined above.4.4 Computing Locally Optimal Control With A, B, Q, R, M, q, r computed, let us first check if H = Q M M⊤ R a positive definite matrix (if it’s not PD, the LQR formulation may run into the case where matrix inverse does not exist and and numerically you will observe NAN as well.). If not, you must use the trick in Eq. 6 to modify H. We now are ready to solve the following linear quadratic system: min π0,…,πT−1 T X−1 t=0 1 2 st − s ⋆ at − a ⋆ ⊤ H st − s ⋆ at − a ⋆ + q r ⊤ st − s ⋆ at − a ⋆ + c(s ⋆ , a⋆ ), (7) subject to st+1 = Ast + Bat + m, at = πt(st), s0 ∼ µ0. (8)With some rearranging terms, we can re-write the above program in the format of Eq. 1. This is shown below. We can expand the cost function as 1 2 st − s ⋆ at − a ⋆ ⊤ H st − s ⋆ at − a ⋆ + q r ⊤ st − s ⋆ at − a ⋆ + c(s ⋆ , a⋆ ) = 1 2 st − s ⋆ at − a ⋆ ⊤ Q M M⊤ R st − s ⋆ at − a ⋆ + q r ⊤ st − s ⋆ at − a ⋆ + c(s ⋆ , a⋆ ) = 1 2 (st − s ⋆ ) ⊤Q(st − s ⋆ ) + 1 2 2(st − s ⋆ ) ⊤M(at − a ⋆ ) + 1 2 (at − a ⋆ ) ⊤R(at − a ⋆ ) + q r ⊤ st − s ⋆ at − a ⋆ + c(s ⋆ , a⋆ ) = 1 2 (s ⊤ t Qst − 2s ⋆⊤Qst + s ⋆⊤Qs⋆ ) + (s ⊤ t M at − a ⋆⊤M⊤st − s ⋆⊤M at + s ⋆⊤M a⋆ ) + 1 2 (a ⊤ t Rat − 2a ⋆⊤Rat + a ⋆⊤Ra⋆ ) + q r ⊤ st − s ⋆ at − a ⋆ + c(s ⋆ , a⋆ ) = 1 2 s ⊤ t Qst − s ⋆⊤Qst + 1 2 s ⋆⊤Qs⋆ + s ⊤ t M at − a ⋆⊤M⊤st − s ⋆⊤M at + s ⋆⊤M a⋆ + 1 2 a ⊤ t Rat − a ⋆⊤Rat + 1 2 a ⋆⊤Ra⋆ + q ⊤st − q ⊤s ⋆ + r ⊤at − r ⊤a ⋆ + c(s ⋆ , a⋆ ) = s ⊤ t ( Q 2 )st + a ⊤ t ( R 2 )at + s ⊤ t M at + (q ⊤ − s ⋆⊤Q − a ⋆⊤M⊤)st + (r ⊤ − a ⋆⊤R − s ⋆⊤M)at+ (c(s ⋆ , a⋆ ) + 1 2 s ⋆⊤Qs⋆ + 1 2 a ⋆⊤Ra⋆ + s ⋆⊤M a⋆ − q ⊤s ⋆ − r ⊤a ⋆ ) Next, let us expand out the transition function, f. As in section 3.1, we perform first-order taylor expansion of f around (s ⋆ , a⋆ ) so our transition is defined by st+1 = A(s − s ⋆ ) + B(a − a ⋆ ) + f(s ⋆ , a⋆ ) = As − As⋆ + Ba − Ba⋆ + f(s ⋆ , a⋆ ) = As + Ba + (f(s ⋆ , a⋆ ) − As⋆ − Ba⋆ ) 7 Thus, let us define the following variables Q2 = Q 2 R2 = R 2 q ⊤ 2 = q ⊤ − s ⋆⊤Q − a ⋆⊤M⊤ r ⊤ 2 = r ⊤ − a ⋆⊤R − s ⋆⊤M b = c(s ⋆ , a⋆ ) + 1 2 s ⋆⊤ Q 2 s ⋆ + 1 2 a ⋆⊤Ra⋆ + s ⋆⊤M a⋆ − q ⊤s ⋆ − r ⊤a ⋆ m = f(s ⋆ , a⋆ ) − As⋆ − Ba⋆ Thus, we can rewrite our formulation as min π0,··· ,πT−1 T X−1 t=0 s ⊤ t Q2st + a ⊤ t R2at + s ⊤ t M at + q ⊤ 2 st + r ⊤ 2 at + b subject to st+1 = Ast + Bat + m, at = πh(st), s0 ∼ µ0 This is exactly the same formulation as in section 3. Thus, we use the formulation there to derive the optimal policies.4.4.1 TODO: Please complete the function lqr in lqr.py using the formulation derived above. You will need to compute the optimal policies for time steps T − 1, …, 0. At timestep t, we find find π ⋆ t using Q⋆ t .In the LQR setting of equation 1, where the cost is written with respect to some fixed matrices and vectors (i.e., Q, R, M, q, r, b), we are able to derive the closed-loop solutions of the policy. So far, we used this insight to approximate nonlinear control as LQR. However, recall that we originally derived the LQR policy using dynamic programming. What if we apply the idea of approximation to the dynamic programming steps directly? For our policy at time step i, we care about the cost-to-go as the accumulated sum of the costs in the future steps. The optimal Value at time i is V ⋆ i (s) ≡ min πi,…,πT +1 T X−1 t=i c(st, at) s.t. si = s, st+1 = f(st, at), at = πt(st).Using the finite horizon Bellman Optimality Equation (i.e. the principle behind Dynamic Programming), we can reduce this minimization over the entire sequence of controls to one action: V ⋆ i (s) = min a [c(s, a) + V ⋆ i+1(f(s, a))] = min a Q ⋆ i (s, a). Let’s try to do a second order approximation to Q⋆ i (s, a) directly. While you are responsible for understanding the motivation and basic ideas (quadratic approximation, dynamic programming) behind DDP, you are not required understand the technical details of the following derivation precisely. We will approximate around a point s0, a0. Let s = s0 + δs and a = a0 + δa be small perturbations. Then, Q ⋆ i (s, a) − Q ⋆ i (s0, a0) = c(s0 + δs, a0 + δa) − c(s0, a0) + V ⋆ i+1(f(s0 + δs, a0 + δa)) − V ⋆ i+1(f(s0, a0)) .Approximating around δs = 0, δa = 0 to second order results in Q ⋆ i (s, a) − Q ⋆ i (s0, a0) ≈ 1 2 1 δs δa ⊤ 0 Q⊤ s Q⊤ a Qs Qss Qsa Qa Qas Qaa 1 δs δa . (9) The expansion coefficients are Qs = cs + f ⊤ s V i+1 s , Qs ∈ R ns Qa = ca + f ⊤ a V i+1 s , Qa ∈ R na Qss = css + f ⊤ s V i+1 ss fs + V i+1 s · fss, Qss ∈ R ns×ns Qaa = caa + f ⊤ a V i+1 ss fa + V i+1 s · faa, Qaa ∈ R na×na Qas = cas + f ⊤ a V i+1 ss fs + V i+1 s · fas, Qas ∈ R na×ns where we denote: cs and ca the gradients of the cost function; css, caa, cas the Hessians of the cost function; fs and fa the Jacobian of the transition function; fss, faa and fas the Jacobian of the Jacobian of the transition function (this is a 3D tensor!); V i+1 s and V i+1 ss are the coefficients of the expanded value function at time step i + 1, which will be defined later.Our policy minimizes δa with respect to Q using the quadratic approximation in Eq. 9. With simple computation, we can get δa⋆ = −Q −1 aa (Qa + Qasδs). We define ki = −Q−1 aa Qa and Ki = −Q−1 aa Qas. Plug the policy into 9, we can derive (iteratively as using dynamic programming) the quadratic approximation of the value functions at timestep i: V ⋆ i (s) − V ⋆ i (s0) ≈ − 1 2 Q ⊤ a Q −1 aa Qa V i s = Qs − Q ⊤ saQ −1 aa Qa V i ss = Qss − Q ⊤ saQ −1 aa Qas The above procedure computes the control actions and values from T − 1 to 0 in a “backwards pass.” And once the backward pass is complete, we can use the policy to roll-out a new trajectory, which gives us new states and actions (s0, a0) to linearize around. Formally, the forward pass calculates: sˆ0 = s0 aˆi = ai + ki + Ki(ˆsi − si) sˆi+1 = f(ˆsi , aˆi) The algorithm iterates until the forward pass and the backward pass converge.5.1 DDP Implementation We will implement DDP to complete a more challenging task—making the cartpole swing up. 5.1.1 TODO: There are two main sections that need to be completed in DDP.py: forward(), which is the forward dynamics and the main driver code in train(). We have provided a small starter implementation for the number of timeststeps that it took for our solution to swing up the cartpole. But if yours takes less, feel free to change it.6.1 LQR Stress Tests Given policies π0, . . . , πT −1 that computed from previous sections, we will evaluate its performance by executing it on the real system f and real cost c. To generate a T-step trajectory, we first sample s0 ∼ µ0, and then take at = πt(st), and then call the black-box f to compute st+1 = f(st, at) till t = T − 1. This gives us a trajectory τ = {s0, a0, s1, a1, . . . , sT −1, aT −1}. The total cost of the trajectory is C(τ ) := PT −1 t=0 c(st, at).In the main file cartpole.py, we provide several difference initialization states. These initial states are ordered based on the distance between the means to the goal (s ⋆ , a⋆ ). Intuitively, we should expect that our control perform worse when the initial states are far away from the taylor-expansion point (s ⋆ , a⋆ ), as our linear and quadratic approximation become less accurate. Testing your computed policies on these difference initializations and report the performances for all initializations.6.1.1 TODO Run cartpole.py –env LQR and take a screenshot of the output costs printed from running the LQR version. The costs can vary depending on the version of packages used (even with the environment being seeded). However, your costs should fall in the following ranges • Case 0 < 10 • Case 1 < 200 • Case 2 < 1000 • Case 3 < 2000 • Case 4 < 5000 • Case 5 = ∞ • Case 6 = ∞ • Case 7 = ∞ Please know that these are loose upper bounds and the actual costs for a correct implementation may be much lower than the upper bound listed (apart from Case 5-7). Please briefly explain why you think the costs are different for each case.6.2 DDP Tests Likewise, we also provides different initial states for the swing-up task that are ordered such that their distance are increasing from the goal state. This time, since DDP no longer depends on the assumption of local linearization, we should expect our control to be able to complete the swing-up task even far away from the goal state.6.2.1 TODO Run cartpole.py –env DDP –init s far and describe the behavior of the trained cartpole. Since the training process for DDP takes longer, we ask you to manually test the initial states one by one. You can simply exchange the –init s to be {far,close,exact}.Observe the generated videos, for each test case you run: how does DDP perform with the different initial states? Compare the LQR policy and the DDP policy: how do they perform on different initial states? What’s the reason behind their differences?6.3 Submission Please submit a zip file containing your implementation as follows: YOUR NET ID/ Answers.pdf README.md init .py cartpole.py local controller.py finite difference method.py lqr.py ddp.py test.py env/ init .py cartpole lqr env.py cartpole ilqr env.py where Answers.pdf contains the screenshot of the cost of the cartpole simulations, the description of what happen in the three initial states in the cartpole swing-up task, and your explanations to the above highlighted three research questions.
For this assignment, we will ask you to work in a 4×4 gridworld. This project consists of the following 5 files • setup.py: This file contains the environment requirements that need to be installed prior to starting the project.• visualize.py: This file can be used to help visualize the V-function and policy. There are also functions that can be used for stepping through value and policy iteration. One can see how to use this file by calling python visualize.py -h. For example, to use dynamic programming and a MDP of horizon 10, run python visualize.py –alg DP –horizon 10.• MDP.py: This contains the MDP class which is used to represent our gridworld MDP. This contains the states, actions, rewards, transition probabilities, and the discount factor. • test.py: Test case files. Run this with python test.py• DP.py: Student code goes here. This file will be used for implementing dynamic programming approach to extract optimal policy and value functions for finite horizon MDPs. • IH.py: Student code goes here. This file will be used for implementing value iteration, exact policy iteration, and approximate policy iteration. Please implement all the functions with a TODO comment in DP.py and IH.py.For submission to Gradescope, please do the following: • Run the visualization for value iteration to iteration 20 with gamma=={0.9}. Save the figure showing the Q-values. Additionally, save the figure showing the max policy and its corresponding value function for gamma=={0.5,0.9,0.999}.• Run the visualization for gamma=={0.9} for exact and iterative policy iteration for 5 iterations. Save both figures. • Run the visualization for the dynamic programming for H=={1000,10,2}. Save the figures showing the policy and its corresponding value function at timestep 0; moreover, for H==10, step through until timestep 8 and save corresponding figures.• Create a PDF writeup in which you include the figures above and the answers to the questions in the discussion section 8. • Zip the PDF writeup along with the 5 python files listed above into submit.zip and submit on Gradescope.2.1 States and rewards There are 17 (0-16) states where each box corresponds to a state as shown below State 15 (the green box) shown above is the ”goal” state and has a reward of 200. All the white boxes have reward -1 while the orange boxes are ”bad” states that have -80 reward. While our gridworld is 4×4, we need an extra state (state 16) to be the terminal state. This state has reward 0 and is absorbing which means once we enter the terminal state, we cannot leave as any action takes us back to the terminal state.Additionally, once we hit the goal state, we automatically transition to the terminal state. This terminal state allows us to model tasks which terminate at a finite time using the infinite horizon MDP formulation. Note that the rewards here are outside the range [0,1]. Although many theoretical results assume the reward lies within [0,1], in practice the reward can be any scalar value.Since we are using a simple state representation, we only need to store the number of states in the MDP class. The rewards, R, is a |A| × |S| numpy array (note the first index is the action unlike in class). That is, r(s,a) = R[a,s]. The rewards in this MDP are deterministic.2.2 Actions There are 4 actions that can be taken in each state. These actions are up (0), down (1), left (2), right (3).2.3 Transition probabilities The transition probabilities are easy to define for a gridworld MDP which makes it a good setting for implementing value iteration, policy evaluation, and policy iteration. The transition matrix P, in MDP.py, contains the transition probabilities. Note that the transitions are stochastic (i.e. moving right from state 0 isn’t guaranteed to move you to state 1, you may ”slip” and move to state 4 or stay in the same spot).In particular, when transitioning from a white or orange box, the agent ends up in the correct spot with probability 0.7, and slips laterally with probability 0.15 to either of the adjacent spots. P is an |A| × |S| × |S| NumPy array. The transition probability P(s ′ |s, a) can be found by indexing P[a, s, s′ ]2.4 Policies We will work with deterministic policies and will represent policies using a NumPy array of size |S|. The s th entry in such a vector represents the (deterministic) action taken by the policy at state s.In this section, we are considering finite horizon MDPs with horizon H. We are able to use dynamic programming to directly extract the optimal policy at each timestep, and calculate the exact value functions.3.1 TODOs For dynamic programming, there are two functions (TODOs) to complete. • computeQfromV • dynamicProgramming3.2 Time varying policies and values We consider time-varying policies π = (π0, …, πH) The value of a state also depends on time V π t (s) = E “H X−1 k=t r(sk, ak)|s0 = s, sk+1 ∼ P(sk, ak), ak ∼ πk(sk) # . The Bellman Consistency Equation for the finite horizon is V π t (s) = Ea∼πt(s)r(s, a) + Es ′∼P (s,a) [V π t+1(s ′ )] . Given Vt+1, you can implement computeQfromV using the equation Q π t (s, a) = r(s, a) + Es ′∼P (s,a) [V π t+1(s ′ )].3.3 Iterative Bellman Optimality We can solve the Bellman Optimality Equation directly with backwards iteration. • Initialize V ∗ H = 0 • For t = H − 1, H − 2, …, 0 : – Q∗ t (s, a) = r(s, a) + Es ′∼P (s,a) [V ∗ t+1(s ′ )] – π ∗ t (s) = arg maxa Q∗ t (s, a) – V ∗ t (s) = Q∗ t (s, π∗ t (s)) Use this pseudocode as a guide to implement dynamicProgrammingWe now change gears and consider infinite horizon MDPs with discount factor γ. We are no longer able to directly calculate value functions or compute the optimal policy. Instead, we will implement iterative algorithms discussed in lecture: Value Iteration, Policy Evaluation, and Policy Iteration. 4.1 TODOs For value iteration, there are 6 functions (TODOs) to complete. • computeVfromQ • computeQfromV • extractMaxPifromQ • extractMaxPifromV • valueIterationStep • valueIteration4.2 Switching between Q and V functions Given V π and Qπ , please implement computeVfromQ and computeQfromV using the equations Q π (s, a) = r(s, a) + γEs ′∼P (·|s,a)V π (s ′ ), V π (s) = Q π (s, π(s)).4.3 Computing max policy From class, we know that if we have the optimal Q-function, Q∗ , we can find the optimal policy π ⋆ (s) = arg max a Q ⋆ (s) We can also use this operation at the end of value iteration on Qt (our final Q values) and also during the policy improvement stage of policy iteration. Please implement extractMaxPifromQ and extractMaxPifromV.4.4 Value Iteration First, implement valueIterationStep. This implements the following equation ∀s, a : Q t+1(s, a) = r(s, a) + γEs ′∼P (·|s,a) max a′ Q t (s ′ , a′ ) Qt is passed in to this function while the rewards and transition probabilites can be accessed from the attributes. While not strictly necessary, try implementing this function without for-loops.Next, implement valueIteration following the specification. This should consist of running valueIterationStep in a loop until the threshold is met. Then, extract the policy from the final Q-function and compute the corresponding V-function. Return the policy, V-function, iterations required until convergence, and the final epsilon.In this section, we ask you to implement both exact policy evaluation and iterative policy evaluation. 5.1 TODOs For value iteration, there are 2 functions (TODOs) to complete. • exactPolicyEvaluation • approxPolicyEvaluation5.2 Exact Policy Evaluation The Bellman equation for deterministic policies, π, from class is as follows V π (s) = r(s, π(s)) + γEs ′∼P (·|s,π(s))V π (s ′ ) = r(s, π(s)) + γ X s ′ P(s ′ |s, π(s))V π (s ′ ) Using vector notation, this can be written as the linear equation V = R π + γP πV Hint: Use extractRpi and extractPpi to easily extract r(s, π(s)) and P(s ′ |s, π(s)) in vector form. By solving this, we can compute V π . Do not use matrix inversion for this. Instead use np.linalg.solve.5.3 Iterative Policy Evaluation Implement approximate policy evaluation following the algorithm given in class. Each step of iterative policy evaluation involves computing V t+1 given the current estimate V t V t+1 = R π + γP πV t Run this algorithm until convergence (as defined by the tolerance).Finally, we will move onto policy iteration. This is another iterative algorithm in which we cycle between policy evaluation and policy improvement to continuously improve our policy. Note the difference between this algorithm and value iteration. In value iteration, we did not compute a policy until we computed the final estimate of the optimal Q-function.6.1 TODOs For policy iteration, there are 2 functions (TODOs) to complete. • policyIterationStep • policyIteration First, implement the function policyIterationStep. Given the current policy π t , compute the valuefunction for π t using policy evaluation. This can be done using either exact or iterative policy evaluation. With V π t , compute the new policy.Once policyIterationStep is implemented, implement policyIteration in the same style as valueIteration. That is, continuously call policyIterationStep until convergence. Convergence in this case occurs when the policy improvement step does not change our current policy.We have written some basic tests for you to test your implementation. These are separated into tests for DP, VI, PE, PI. Please note that these tests are basic and do not guarantee that your implementation is correct. We have also given you code for visualizing value iteration and policy iteration.7.1 Dynamic Programming After implementing dynamic programming, run python visualize.py –alg DP. This will allow you to visualize value function and policy starting from timestep 0 to the default horizon 10. A new window would pop up showing the two subfigures. You should be able to step through it to see the policy at each timestep. Likewise, it will display the value function for each timestep.7.2 Value Iteration After implementing value iteration, run python visualize.py –alg VI. This will allow you to visualize value iteration from an initial Q0 of all zero’s. To change the initial Q0 , change line 291 in visualize.py. Running this command should open the following window. This shows the Q-values for each of the 4 actions. To step through an iteration of value iteration, press the right arrow key. The grids should update as you step through value iteration.Normally, value iteration runs until convergence and then the final policy is extracted. However, we have added extra visualization to allow you to see how the policy changes. For any iteration, t, press the enter key to compute π(s) = arg maxa Qt (s, a) and V π . A new window should show up showing the V-function and policy as shown below To continue stepping through value iteration, exit this window and continue pressing the right arrow key on the main figure window.7.3 Policy Iteration To visualize policy iteration, run python visualize.py –alg PI exact or run python visualize.py –alg PI approx. Again, use the right arrow key to step through policy iteration. To change π 0 , change line 171 or 174 of visualize.py.Section 8: Discussion Please answer the following questions in a few sentences and include your answers in the writeup. • For the finite horizon problem, compare the value function and optimal policy at timestep 0 for different values of H. How does it change? Why? • For the finite horizon problem, compare the value function and optimal policy at timestep 8 for H = 10 and at timestep 0 for H = 2. What’s their relation? • Now let’s consider the infinite horizon problem. Compare the value function and optimal policy for different values of γ. How does it change? Why? • Compare the value functions and optimal policies for finite horizon MDP versus infinite horizon MDP. More specifically, H = 2 corresponds to γ = 0.5, H = 10 corresponds to γ = 0.9, and H = 1000 corresponds to γ = 0.999. How similar are they? What differences do you notice? Why do you think these differences exist?
Suppose there are 2 ad slots each with its own clickthrough rate: Slot Click Through Rate a 10 b 7 And suppose there are 3 advertisers: Advertiser Valuation Per Click x 10 y 9 z 2We also suppose that the search engine assigns ad slots using a Gnereralized Second-Price (GSP) auction, and answer the following questions: 1. In this problem, is truthful bidding a Nash equilibrium? To receive full credit, you must either prove that truthful bidding is a Nash equilibrium or show which advertiser(s) could profit by changing their bid away from truthful bidding. Answers: 1. ANSWER 1 2. Find a Nash equilibrium for this problem (without truthful bidding) in which the total advertiser valuation is maximized (i.e., advertiser x is assigned slot a and advertiser y is assigned slot b). You may use any technique you want to find this Nash equilibrium, but you must convince me that it is an actual Nash equilibrium. Answers: 2. ANSWER 2 3. Create a new sponsored search problem with at least 2 slots and at least 3 bidders in which truthful bidding is a Nash equilibrium.
Chapter 9 1. In this question we will consider an auction in which there is one seller who wants to sell one unit of a good and a group of bidders who are each interested in purchasing the good. The seller will run a sealed-bid, second-price auction.Your firm will bid in the auction, but it does not know for sure how many other bidders will participate in the auction. There will be either two or three other bidders in addition to your firm. All bidders have independent, private values for the good. Your firm’s value for the good is c. What bid should your firm submit, and how does it depend on the number of other bidders who show up? Give a brief (1-3 sentence) explanation for your answer2. seller announces that he will sell a case of rare wine using a sealed-bid, second-price auction. A group of I individuals plan to bid on this case of wine. Each bidder is interested in the wine for his or her personal consumption; the bidders’ consumption values for the wine may differ, but they don’t plan to resell the wine. So we will view their values for the wine as independent, private values (as in Chapter 9). You are one of these bidders; in particular, you are bidder number i and your value for the wine is Vi . How should you bid in each of the following situations? In each case, provide an explanation for your answer; a formal proof is not necessary.Chapter 10 1. Suppose we have a set of 3 sellers labeled a, b, and c, and a set of 3 buyers labeled x, y, and z. Each seller is offering a distinct house for sale, and the valuations of the buyers for the houses are as follows. Suppose that sellers a and b each charge 2, and seller c charges 1. Is this set of prices market-clearing? Give a brief explanation.2. Suppose we have a set of 3 sellers labeled a, b, and c, and a set of 3 buyers labeled x, y, and z. Each seller is offering a distinct house for sale, and the valuations of the buyers for the houses are as follows. Suppose that sellers a and c each charge 1, and seller b charges 3. Is this set of prices market-clearing? Give a brief explanation.3. Suppose we have a set of 3 sellers labeled a, b, and c, and a set of 3 buyers labeled x, y, and z. Each seller is offering a distinct house for sale, and the valuations of the buyers for the houses are as follows. Suppose that a charges a price of 3 for his house, b charges a price of 1 for his house, and c charges a price of 0. Is this set of prices market-clearing? If so, explain which buyer you would expect to get which house; if not, say which seller or sellers should raise their price(s) in the next round of the bipartite-graph auction procedure from Chapter 10.
1. Say whether the following claim is true or false, and provide a brief (1-3 sentence) explanation for your answer. Claim: If player A in a two-person game has a dominant strategy sA, then there is a pure strategy Nash equilibrium in which player A plays sA and player B plays a best response to sA.3. Find all pure strategy Nash equilibria in the game below. In the payoff matrix below the rows correspond to player A’s strategies and the columns correspond to player B’s strategies. The first entry in each box is player A’s payoff and the second entry is player B’s payoff.4. Consider the two-player game with players, strategies and payoffs described in the following game matrix. (a) Does either player have a dominant strategy? Explain briefly (1-3 sentences). (b) Find all pure strategy Nash equilibria for this game.5. Consider the following two-player game in which each player has three strategies.
Figure 1.1 1. Let every node’s population be 1 (i.e., Ni = 1 for each i). Let β = 0.8 and γ = 0.5. Let all initial infections be 0 except for node 1; node 1’s initial infection should be 0.01 (so Ii(0) = 0.01 and Si(0) = 0.99 , but all other nodes have Si(0) = 1, Ii(0) = 0 and Ri(0) = 0). What are the S, I, and R quantities after 1 time step?2. How many time steps would it take before someone is infected in every node? 3. Suppose that the connection between node 2 and node 1 is removed. If nothing else changes in the network, would this change the spread of the epidemic? Can you predict how many people in node 2 would eventually get sick?4. Repeat that same thought experiment, but this time let the initial infection start on node 2 (so I2(0) = 0.01 and S2(0) = 0.99, but all other nodes have Si(0) = 1, Ii(0) = 0 and Ri(0) = 0). Now can you predict how many people in node 2 would eventually get sick?5. Instead, suppose you remove the connection from node 2 to node 1, and replace it with a self-loop on 2 with weight 1. If nothing else changes in the network (and the infection starts at node 1), would this change the spread of the epidemic? Can you predict how many people in node 2 would eventually get sick?
1. Consider the following graphs in Figure 1.1 Figure 1.1 (a) Are these graphs strongly connected? Explain why or why not. (b) Are these graphs aperiodic? If a graph is periodic, compute its period. 2. Consider the following weighted adjacency matrix in Figure 1.2: Figure 1.2 (a) Is this matrix row-stochastic? (b) Draw the graph corresponding to this matrix. (c) Is the graph strongly connected? (d) Is the graph aperiodic? 3. Consider the following graph in Figure 1.3Figure 1.3 (a) Write the weighted adjacency matrix corresponding to this graph. Verify that the graph you wrote is row-stochastic. (b) By hand, using matrix multiplication, perform 1 step of the DeGroot opinion dynamic model on this graph with initial opinions of (1, 1, 0.5, 0, 0).(c) In the DeGroot opinion dynamics model, which node’s initial opinion gets the most weight in society’s final opinion? Which node gets the least weight? (it is highly recommended that you use computer code to answer this question).4. Consider again the graph from problem 2 (Figure 1.2). (a) By hand, using matrix multiplication, perform 1 step of the Friedkin-Johnsen opnion dynamic model on this graph with initial opinions of (1, 1, 0.5, 0, 0) and lambda values of (0.9, 0.1, 0.8, 1, 0.5).(b) As we discussed in class, in the Friedkin-Johnsen model, nodes usually have differing opinions even after a long time. Keeping the lambda values from part A, experiment with this graph and try to find an assignment of initial opinions that results in the most different limiting opinions. That is, what equilibrium on this graph can have the most disagreement out of all equilibria? You will likely want to use compute code to solve this problem. Full credit is possible if you show understanding and effort; I will award 1 bonus point if you can show that your answer is the most disagreement possible.
2. Consider the network shown in Figure 5.18: there is an edge between each pair of nodes, with five of the edges corresponding to positive relationships, and the other five of the edges corresponding to negative relationships.Each edge in this network participates in three triangles: one formed by each of the additional nodes who is not already an endpoint of the edge. (For example, the A-B edge participates in a triangle on A, B, and C, a triangle on A, B, and D, and a triangle on A, B, and E. We can list triangles for the other edges in a similar way.)For each edge, how many of the triangles it participates in are balanced, and how many are unbalanced. (Notice that because of the symmetry of the network, the answer will be the same for each positive edge, and also for each negative edge; so it is enough to consider this for one of the positive edges and one of the negative edges.3. When we think about structural balance, we can ask what happens when a new node ies to join a network in which there is existing friendship and hostility. In Figures 5.19-5.22, each pair of nodes is either friendly or hostile, as indicated by the + or – label on each edge.First, consider the 3-node social network in Figure 5.19, in which all pairs of nodes know each other, and all pairs of nodes are friendly toward each other. Now, a fourth node D wants to join this network, and establish either positive or negative relations with each existing node A, B, and C. It wants to do this in such a way that it doesn’t become involved in any unbalanced triangles. (I.e. so that after adding D and the labeled edges from D, there are no unbalanced triangles that contain D.) Is this possible?In fact, in this example, there are two ways for D to accomplish this, as indicated in Figure 5.20. First, D can become friends with all existing nodes; in this way, all the triangles containing it have three positive edges, and so are balanced. Alternately, it can become enemies with all existing nodes; in this way, each triangle containing it has exactly one positive edge, and again these triangles would be balanced.So for this network, it was possible for D to join without becoming involved in any unbalanced triangles. However, the same is not necessarily possible for other networks We now consider this kind of question for some other networks.(a) Consider the 3-node social network in Figure 5.21, in which all pairs of nodes know each other, and each pair is either friendly or hostile as indicated by the + or – label on each edge. A fourth node D wants to join this network, and establish either positive or negative relations with each existing node A, B, and C. Can node D do this in such a way that it doesn’t become involved in any unbalanced triangles?* If there is a way for D to do this, say how many different such ways there are, and give an explanation. (That is, how many different possible labelings of the edges out of D have the property that all triangles containing D are balanced?) * If there is no such way for D to do this, give an explanation why not. (In this and the subsequent questions, it possible to work out an answer by reasoning about the new node’s options without having to check all possibilities.(b) Same question, but for a different network. Consider the 3-node social network in Figure 5.22, in which all pairs of nodes know each other, and each pair is either friendly or hostile as indicated by the + or – label on each edge. A fourth node D wants to join this network, and establish either positive or negative relations with each existing node A, B, and C. Can node D do this in such a way that it doesn’t become involved in any unbalanced triangles?* If there is a way for D to do this, say how many different such ways there are, and give an explanation. (That is, how many different possible labelings of the edges out of D have the property that all triangles containing D are balanced?) * If there is no such way for D to do this, give an explanation why not.(c) Using what you’ve worked out in Questions 2 and 3, consider the following question. Take any labeled complete graph – on any number of nodes – that is not balanced; i.e. it contains at least one unbalanced triangle. (Recall that a labeled complete graph is agraph in which there is an edge between each pair of nodes, and each edge is labeled with either + or -.) A new node X wants to join this network, by attaching to each node using a positive or negative edge. When, if ever, is it possible for X to do this in such a way that it does not become involved in any unbalanced triangles? Give an explanation for your answer. (Hint: Think about any unbalanced triangle in the network, and how X must attach to the nodes in it.)Canvas-1. Consider the following graph: Does this graph show evidence of homophily? Justify your answer from concepts we discussed in class.Canvas-2. Consider the Schelling Segregation model we discussed in class. In class, we showed that if the contentedness threshold is 50% (i.e., t=0.5, or I am content as long as at least half my Neighbors are my type), it’s possible to create a long ”row of houses” where everybody in society is content, and in which almost everybody has their maximum number of other-type friends (i.e., almost everybody has 50% of their friends of a different type than themself).Is it possible to extend this concept to other contentedness thresholds, and construct a society in which everybody is content, and almost everybody has a (1-t) fraction of their friends a different type than themself? For full credit, answer this for the specific values t=0.25 and t=0.75. For example, if t=0.25, can you construct a society in which everybody is content, and in which everybody (or almost everybody) has 75
1. Consider the graph in Figure 3.21, in which each edge – except the edge connecting b and c – is labeled as a strong tie (S) or a weak tie (W). According to the theory of strong and weak ties, with the strong triadic closure assumption, how would you expect the edge connecting b and c to be labeled? Give a brief (1-3 sentence) explanation for your answer.2. In the social network depicted in Figure 3.22, with each edge labeled as either a strong or weak tie, which nodes satisfy the Strong Triadic Closure Property from Chapter 3, and which do not? Provide an explanation for your answer3. In the social network depicted in Figure 3.23 with each edge labeled as either a strong or weak tie, which two nodes violate the Strong Triadic Closure Property? Provide an explanation for your answer4. In the social network depicted in Figure 3.24, with each edge labeled as either a strong or weak tie, which nodes satisfy the Strong Triadic Closure Property from Chapter 3, and which do not? Provide an explanation for your answer.
1. One reason for graph theory’s power as a modeling tool is the fluidity with which one can formalize properties of large systems using the language of graphs, and then systematically explore their consequences. In this first set of questions, we will work through an example of this process using the concept of a pivotal node. First, recall from Chapter 2 that a shortest path between two nodes is a path of the minimum possible length. We say that a node X is pivotal for a pair of distinct nodes Y and Z if X lies on every shortest path between Y and Z (and X is not equal to either Y or Z).For example, in the graph in Figure 2.13, node B is pivotal for two pairs: the pair consisting of A and C, and the pair consisting of A and D. (Notice that B is not pivotal for the pair consisting of D and E since there are two different shortest paths connecting D and E, one of which (using C and F) doesn’t pass through B. So B is not on every shortest path between D and E.) On the other hand, node D is not pivotal for any pairs.(a) Give an example of a graph in which every node is pivotal for at least one pair of nodes. Explain your answer. (b) Give an example of a graph in which every node is pivotal for at least two different pairs of nodes. Explain your answer. (c) Give an example of a graph having at least four nodes in which there is a single node X that is pivotal for every pair of nodes (not counting pairs that include X). Explain your answer.2. In the next set of questions, we consider a related cluster of definitions, which seek to formalize the idea that certain nodes can play a “gatekeeping” role in a network. The first definition is the following: we say that a node X is a gatekeeper if for some other two nodes Y and Z, every path from Y to Z passes through X. For example, in the graph in Figure 2.14,node A is a gatekeeper, since it lies for example on every path from B to E. (It also lies on every path between other pairs of nodes – for example, the pair D and E, as well as other pairs.)This definition has a certain “global” flavor, since it requires that we think about paths in the full graph in order to decide whether a particular node is a gatekeeper. A more “local” version of this definition might involve only looking at the neighbors of a node. Here’s a way to make this precise: we say that a node X is a local gatekeeper if there are two neighbors of X, say Y and Z, that are not connected by an edge. (That is, for X to be a local gatekeeper, there should be two nodes Y and Z so that Y and Z each have edges to X, but not to each other.) So for example, in Figure 2.14, node A is a local gatekeeper as well as being a gatekeeper; node D, on the other hand, is a local gatekeeper but not a gatekeeper. (Node D has neighbors B and C that are not connected by an edge; however, every pair of nodes – including B and C – can be connected by a path that does not go through D.)So we have two new definitions: gatekeeper, and local gatekeeper. When faced withnew mathematical definitions, a strategy that is often useful is to explore them first through examples, and then to assess them at a more general level and try to relate them to other ideas and definitions.Let’s try this in the next few questions. (a) Give an example (together with an explanation) of a graph in which more than half of all nodes are gatekeepers. (b) Give an example (together with an explanation) of a graph in which there are no gatekeepers, but in which every node is a local gatekeeper3. When we think about a single aggregate measure to summarize the distances between the nodes in a given graph, there are two natural quantities that come to mind. One is the diameter, which we define to be the maximum distance between any pair of nodes in the graph. Another is the average distance, which – as the term suggests – is the average distance over all pairs of nodes in the graph. In many graphs, these two quantities are close to each other in value. But there are graphs where they can be very different. (a) Describe an example of a graph where the diameter is more than three times as large as the average distance.(b) Describe how you could extend your construction to produce graphs in which the diameter exceeds the average distance by as large a factor as you’d like. (That is, for every number c, can you produce a graph in which the diameter is more than c times as large as the average distance?)
The goal of this project is to learn how to use transformations and camera position to create an animated scene. This project is an extension of Project A2a, and you will incorporate your object from Project A2a into this assignment. Like Project A2a, creativity and effort is a part of this project.Now that you have created an object for Project A2a, you should have some ideas of how you wish to incorporate it into a fully animated scene. The most interesting animated scenes, no matter how short, seek to tell a story. Introduce your character, have your character carry out an action, and then resolve the scene.You will most likely want to create more objects to populate your scene, but these new objects can be more simple than Project A2a’s object. As with A2a, you should create the objects you will need in `initializeScene()`. Your main goal for this assignment will be to make one or more of the objects in the scene move, and to also move the virtual camera through the scene. You will want to use the “time” variable passed into `updateScene(time)` to help control motion of your objects (this value measures time in milliseconds, from some arbitrary starting point). (Note: in common.js, we get the time of the first render callback, and set `this.startTime` to it, so that you can use `time – this.startTime` in `updateScene(time)` as the time since the program started.)Below is a checklist of elements that you must include in your scene:### Camera MotionWe have removed the camera `OrbitControl` from the sample code. You should move the camera smoothly through the scene, rather than keeping it in one place. The scene is set up with a PerspectiveCamera stored in `this.camera`. Change the camera’s position by varying position and rotation of the PerspectiveCamera object. Just rotating or moving the entire scene while leaving the camera stationary does not count as moving the camera.Please note that having the user press keys or move the mouse to control the camera does not count towards automatic motion of the camera. If you want to include user controls, have an automatic motion of the camera in the first part of the scene, before handing over controls to the user.### Include Project 2A ObjectYou must incorporate Project 2A’s object somewhere in your scene.### Object AnimationInclude at least two object motions in the scene (distinct from the camera motion). One of these motions should include translation, and another, different motion should include rotation. If you wish, these two different motions can be for two different parts of the same object, or they can be motions of different objects. Make sure it is clear that these objects are moving, and not just changing their apparent positions due to camera motion.### Parallel AnimationSome of your animations, including the camera and object animation, should happen in parallel for at least 2 seconds of the animation. This means at least to things (the camera and one other object, or two objects) should be moving at the same time.### Lighting and ShadingWe have removed the lights from A2a from `common.ts`. You must include at least two light source in your scene such that the objects in the scene are illuminated by some light source. Do not use only ambient light. Do not add more than 8 lights total.### DurationYour animation should create more than 10 seconds of animation. Please create an animation that finishes in a reasonable amount of time.If you find that your program runs slowly, it is most likely that this is due to use of many polygons in the scene. By far, the most common reason for having many polygons in this assignment is the use of lots subdivisions of objects like spheres and cones and cylinders. If you want to use many subdivisions and you have a lot of these objects, consider passing the low valued parameters for subdivision parameters to the constructor for that geometry. The subdivisions specify how many polygons a object uses.We will be looking for each of the above required items in your animation. Omitting any of the above elements will cause a deduction in your grade for this assignment.## Resources (from A2a)In this assignment, and others later in the semester, you will use the [three.js](https://threejs.org) graphics library, a very popular open-source graphics library for the web. It is actively being developed and widely used.In addition to the resources on the website (including documentation and many examples), the website [threejsfundamentals.org](https://threejsfundamentals.org/) is a great resource for learning the library. I’d recommend you start there, and only refer to the threejs.org site for reference, or for specific three.js features. For this assignment, I highly recommend working through some of the Basics sections ([Fundamentals](https://threejsfundamentals.org/threejs/lessons/threejs-fundamentals.html) is a great introduction, and the other sections can be skimmed but aren’t necessary for these projects), as well as the [Primitives](https://threejsfundamentals.org/threejs/lessons/threejs-primitives.html), [Scenegraph](https://threejsfundamentals.org/threejs/lessons/threejs-scenegraph.html), and some of the [Materials](https://threejsfundamentals.org/threejs/lessons/threejs-materials.html) sections in “Fundamentals”.Beyond these resources, there are many open source projects based on three.js. While you should not be using code from the web in this and future assignments, together these resources give you a vast set of learning resources.Of course, feel free to reach out to the instructor, TA, and your classmates for help and pointers.## Effort is Part of the GradeThis assignment will be graded partially based on our assessment of the amount of care, effort, and creativity that you put into the scene and animations. If you choose a simple scene and throw it together, you will not get a high score on the “effort” component of this project. 1 point of the project will be based on our assessment of reasonable effort, and up to 1 addition point will be awarded for exceptional creativity.## Optional ElementsIf you wish to, you may add any three.js Materials to your scene, which includes textures and a variety of other effects. It is not necessary to add more complex Materials in order to have a successful animated scene.Unlike part A2a, you are allowed to load external models as part of your scene. These might be objects that you have created using modeling programs such as Blender or Maya. Note, however, that in this project we are looking for how you animate your scene, and not for your skill in modeling in these other systems. We will not count the creation of these models towards our evaluation of your effort. Your emphasis should be on scene creation and animation with three.js.## Authorship RulesThe code that you turn in entirely your own. You are allowed to talk to other members of the class and to the instructor and the TA’s about general implementation of the assignment. It is also fine to seek the help of others for general three.js/Typescript programming questions. You may not, however, use code that anyone other than yourself has written. The only exception is that you should use the example source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, github, from books, from the [three.js](https://threejs.org) web site, from other assignments, from other students or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TA’s for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there. All of your code should be in the file `app.ts`. You may extra files if you wish to use them to structure your code.**Do Not Change the names** of the existing files (e.g., index.html, app.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“` 2. run with “`npm run dev“` 3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to github, check it out again to a new directory and make sure everything builds and runs correctly.## Development EnvironmentThe development environment is the same as used in previous assignments.
The goal of this project is for you to practice using a scene graph to assemble 3D scene out of 3D objects. A secondary learning goal is to learn the basics of the [three.js](https://threejs.org) graphics library, which you will be using for this and other assignments in the class.For this assignment, you will create a single complex object by combining multiple simple objects into a hierarchy (see below). This is part one of a two-part project. During the second part (A2b), you will incorporate this object into a 3D animated scene that tells a simple story, and this scene will also be of your own choosing.Decide on an object that you want to include in an animated scene in A2b. You should select the main character, or “protagonist” object from your planned animation to use in this assignment. Keep in mind that a main character does not need to be a human or animal character, but can be almost anything that you can build a story around (e.g. musical instrument, telephone, house, car, or vacuum cleaner). In A2b of this project, you will concentrate on making things move in your final short animation.Do not use a snowman or the green android character (the mascot for the mobile OS) as your main character — if you do, this will be an automatic 2 point deduction. They are too simple and “obvious” examples; we would like to see you be more creative.The key aspect of this assignment is that the 3D object that you create should be made up of multiple pieces. You will use the scene graph hierarchy and transformations (rotate, scale, translate) on the graph nodes to put together the parts to form the final object.Here are the required elements for your main character:– Your object should be something recognizable, and not some abstract or random shape. – Use at least twelve different sub-parts to make your object (e.g. six spheres and six cylinders). – Include at least two different kinds of parts in your object (e.g. sphere and block). – Most or all of the sub-parts of your object should be touching or overlapping each other. – Use at least two different colors (three.js Materials) in your object. – the “root” of your object graph should have it’s origin at an intuitive position of your object. This will make it easier to animate in A2b. For example, if you were creating a human, the origin might be the center of the body or the waist, but a hand, foot, or eye would not be an intuitive choice. (While not required, you should also consider which parts of the object you might want to animate in A2b, and arrange for their local origin to be in an intuitive place. For example, it will be easier to rotate a character’s head if the local origin is at the place it rotates on the neck, rather than being at the center point of the head.) – You must use at least 3 levels of hierarchy: the root node is one level, the parts are a the final level, and there should be at least one level of nodes (with transforms) between them. These intermediate nodes should have transformation on them: you should not just position and rotate the parts directly below the root node, there should be another level of transformations between each part and the root. – Make use of all three basic transformations to assemble your object (rotate, scale, translate).The three.js sample code includes a simple camera controller so that you can rotate the scene with the mouse; you should size you object so that it fits in the area viewed by this camera. If you want to create an object that is bigger or smaller than the area viewed around the origin of this sample scene, simply add a node above your object’s root node, and apply a scale to make your object a different size.## ResourcesIn this assignment, and others later in the semester, you will use the [three.js](https://threejs.org) graphics library, a very popular open-source graphics library for the web. It is actively being developed and widely used.In addition to the resources on the website (including documentation and many examples), the website [threejsfundamentals.org](https://threejsfundamentals.org/) is a great resource for learning the library. I’d recommend you start there, and only refer to the threejs.org site for reference, or for specific three.js features. For this assignment, I highly recommend working through some of the Basics sections ([Fundamentals](https://threejsfundamentals.org/threejs/lessons/threejs-fundamentals.html) is a great introduction, and the other sections can be skimmed but aren’t necessary for these projects), as well as the [Primitives](https://threejsfundamentals.org/threejs/lessons/threejs-primitives.html), [Scenegraph](https://threejsfundamentals.org/threejs/lessons/threejs-scenegraph.html), and some of the [Materials](https://threejsfundamentals.org/threejs/lessons/threejs-materials.html) sections in “Fundamentals”.Beyond these resources, there are many open source projects based on three.js. While you should not be using code from the web in this and future assignments, together these resources give you a vast set of learning resources.Of course, feel free to reach out to the instructor, TA, and your classmates for help and pointers.## Effort is Part of the GradeThis assignment will be graded partially based on our assessment of the amount of care, effort, and creativity that you put into creating your object. If you choose a simple object and throw it together, you will not get a high score on the “effort” component of this project. 18% of the project will be based on our assessment of your effort.## Optional ElementsIf you wish to, you may add any three.js Material properties to your object, such as textures or other builtin effects. It is not necessary to add complex Materials in order to have a successful object for this project. Simple materials such as those used in the sample code are fine, as are more complex ones.You should use the built-in three.js classes for creating cubes, cylinders, cones, tori, and spheres for your object’s parts. If you feel they are not sufficient for making your object you should feel free to create more complex geometry meshes, although this is not required. If you create a sub-part with polygon code of your own, this can count as one of the kinds of parts in your object.Feel free to make use of any programming structures you would like to assemble your object. If some of the sub-parts of an object are regularly spaced, using a loop to create and position these parts is entirely appropriate. You may find that your object’s parts really ought to be made of even smaller parts. If so, great! It is fine for the sub-parts to have sub-sub-parts, and a deep and complex scene graph.## Turn In a Screen ShotPlease include a JPG or PNG image of your object as part of the files that you turn in. Try to keep your image square in shape, and give the file your name (e.g. jane_doe.jpg). 0.2 out of 10 points for this project will be for you including this screen shot of your object. Here is an example of such an image (although yours should be of your character):## Not AllowedThe object that you turn in for this project should not be an object that was created using a modeling or animation package such as Blender or Maya. For this assignment, you are not allowed to read an object from a file, even to use as a subpart.## Authorship RulesThe code that you turn in entirely your own. You are allowed to talk to other members of the class and to the instructor and the TA’s about general implementation of the assignment. It is also fine to seek the help of others for general three.js/Typescript programming questions. You may not, however, use code that anyone other than yourself has written. The only exception is that you should use the example source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, github, from books, from the [three.js](https://threejs.org) web site, from other assignments, from other students or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TA’s for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there. All of your code should be in the file `app.ts`. You may extra files if you wish to use them to structure your code.**Do Not Change the names** of the existing files (e.g., index.html, app.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“` 2. run with “`npm run dev“` 3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to github, check it out again to a new directory and make sure everything builds and runs correctly.## Development EnvironmentThe development environment is the same as used in previous assignments.
5/5 - (1 vote) For this project, you will be programming the GPU using *vertex* and *fragment* hardware shaders to accomplish various visual effects. As we discussed in class:* Fragment shaders are used to render individual “fragments” on the screen (basically, they render individual pixels). They are typically used to modify the coloring and lighting of the polygons being rendered.* Vertex shaders are used to set interpolated inputs to the fragment shaders and possibly modify the vertex positions “`(x,y,z)“` of the polygons being rendered.Vertex shaders cannot introduce additional vertices, or delete existing ones — they are limited by the meshes they are given. (Unlike geometry shaders, which are outside the scope of this class).Vertex and fragment shaders are used together in pairs, as a *shader program* — that is, you use one vertex shader and one fragment shader to render something in a particular style, where the outputs of the vertex shader and the inputs of the fragment shader are coordinated.Reminder: WebGL 1 and 2 use different versions of GLSL (although WebGL 2 is backwards compatible with WebGL 1), and both use older versions of GLSL than the most recent versions of OpenGL. You can find plenty of documentation, explanation, and examples on (webglfundamentals.org)[https://webglfundamentals.org] and (webgl2fundamentals.org)[https://webgl2fundamentals.org], along with the (WebGL1)[https://www.khronos.org/files/webgl/webgl-reference-card-1_0.pdf] and (WebGL2)[https://www.khronos.org/files/webgl/webgl-reference-card-2_0.pdf] quick reference cards. Everything you need to do for this assignment can be done with the WebGL1 version of GLSL.## Due: Thursday Nov 25th, 11:59pmThis assignment is graded out of 15.## RubricFor this assignment, you will implement three shaders. The three required shaders will be worth 4.5 points each. You may implement a fourth shader for 3 bonus points.You will receive 1.5 points for updating the index.html page as described below (descriptions of which shader you are doing, and how you chose to implement it).For each of the three required shaders, the following criteria will be used:1. Select a reasonable set of uniforms and attributes for the shader. Modify index.html as described below (button name, description text). (0.5)2. Modify app.ts to initialize the attributes and uniforms when this shader is used, including updating from the input slider as appropriate. (1)3. Reasonable code for the shader modifications (even if it doesn’t work fully). (1)4. Shader works as intended. (2)**For this project, you will be implementing three or four shader programs**. Two or three of them will have most of the work in the fragment shader, and one of them will have most of the work in the vertex shader. **You have several options for each of the four shaders**, so you can choose whichever seems most interesting to you. Some of the options in the required shaders can earn you extra credit. Do not be overwhelmed by the number of options.**Include a description in your index.html web page describing what shaders you chose to implement, change the button labels to correspond to the shaders you created, and add any details about how you chose to implement them to the page below the canvas!**If you do not do the fourth shader, leave the shader for the fourth button as the default provided.1. **Texture Generation (fragment shader)** Modify the fragment shader to generate a texture on the fly (i.e., do not pass in a texture, or do not use the texture if you pass one in). **Choose one** of the following options: 1. A checkerboard A checkerboard pattern of squares, with alternating black and white squares. The number of squares within the [u,v] range of [0..1,0..1] (i.e., the plane surface) should vary from 5 to 20, depending on the value of the input slider on the web page. 2. Mandlebrot fractal (1 bonus point) Draw the fractal known as the Mandelbrot Set. Display a white or black Mandelbrot set on some colored background. The colors (and possibly color bands) for the background are for you to decide. Let “`z(n+1) = z(n)^2 + c“`, where “`z“` and “`c“` are both complex numbers. The Mandelbrot set is essentially a map of what happens when using different values of “`c“` (which correspond to different locations in the plane). Let “`z(0) = (0,0)“`, and look at the values “`z(1)=z(0)^2+c, z(2)=z(1)^2+c“`, and so on. Repeatedly plugging the result of a function back into itself is called iteration. If these iterated values stay near zero (never leave a circle of radius “`2“`), then draw a white or black dot at the location “`c“`. If the values do leave the circle, color them something else. Do this for all the values for values of “`c“` such that “`cx“` is in “`[-2..2]“` and “`cy“` is in “`[-2..2]“`. The result is the Mandelbrot Set. Use at least 20 iterations of the function to create your Mandelbrot set. To get more interesting color bands, you can color the fragment differently depending on how many iterating it takes for the values to leave the radius. The provided “pal.png” texture contains the colors in the examples, where the number of iterations (relative to the maximum) were used to do the texture lookup. Here’s an example of what the texture might look like: You can vary the range of “`cx“` and “`cy“` used (zooming in on part of their “`[-2..2]“` range) to see some parts in more detail. 3. Julia set fractal “`z(n+1) = z(n)^2 + c, where c = (0, sin(time))“` (1 bonus point) This fractal of the Julia set is in some sense the inverse of the Mandlebrot Set. To make a Julia set fractal, use the same equations and iterations as above, but start “`z“` at the current texture coordinate instead of “`(0,0)“`, and set “`c = (0, f(sin(time)))“`. To get the time, you can pass it in as a uniform variable. As a result you will get a whole bunch of variations of the Julia fractal, animating over time. 4. Julia set fractal “`z(n+1) = z(n)^6 + c, where c = (0, sin(time)/2.0 + 0.5)“` (1 bonus point) Same as (2), but with a different formula to update “`z“` and “`c“`. Note that “`z^6“` is just computed as the complex multiplication “`z * z * z * z * z * z“`, which is easy to write with a for-loop. Here are some images of the julia set as a texture, with different “`c“` values controlled by “`sin(time)“` passed in as a uniform. 2. **Transparency (fragment shader)**Modify the fragment shader to do something interesting with transparency. **Choose one** from the following options: 1. Turn the plane into *swiss cheese* by adding transparent holes all over it. Hint: you can use the *discard* glsl function to “throw away” a particular pixel in a fragment shader, so that it won’t be rendered at all). The holes should be circular, but you have some discretion with their radius and spacing. The input slider should control the size of the holes. 2. Simulate an “x-ray light” Make the plane transparent where the diffuse light contribution is brightest. Make the size of the transparent hole “pulse” by changing its size as a smooth function of time (i.e., make the brightness threshold a function of time). The input slider should control the brightness threshold.3. **Vertex Shader** Modify the vertex shader to deform the objects. **Choose one** from the following options: 1. Warble Modify the vertices of the quad so that it “warbles” by moving the *y* or *z* component of each vertex with a “`sin“` function of somecombination of the *x* component and time (or mouse position). Modify the normal vectors to make the waves darker where they bend. 2. Ripple Like (a), but make the vertices ripple “outward” radially from the center of the plane, like a pebble dropped into a pond. A new wave should start when you press the space key, and last for a few seconds, getting smaller as it moves toward the edge of the quad. Modify the normal vectors to make the waves darker where they bend. 3. Orbit Make the shapes smaller, and move them in a circle on the “`x,y“` plane (translating, not rotating) as a function of time or mouse position. 4. Light phobia Make the shapes “afraid” of the light by moving points away from the light based on the proximity of the specular highlight tothe vertex (this will make a larger area where the light is bright).4. **Image Manipulation (fragment shader)** (**OPTIONAL**, up to 3 bonus points if you do one of these) Modify the fragment shader to accept two image textures and do some form of image manipulation to blend two image textures, picking from one of the following options: 1. Green Screen Removal Blend the images using green screen removal. One should be of a “green screen” scene (e.g., like this [picture of Ewan McGregor](http://www.justjared.com/photo-gallery/2777540/ewan-mcgregor-green-screen-fun-with-jimmy-fallon-12/fullsize/)). The other should be the image you want to blend the green screen image on top of. Green screen removal works by replacing all “predominantly green” pixels in the first image with the pixels from the second image. Will should pay attention to two details. First, you cannot assume all green pixels are a specific pixel value; rather, you should check the pixels to see how “green” they are and come up with a (simple) metric for when to replace a pixel. Use should use the input slider value to control this threshold, so you can see how changing it affects the removal. Second (for 1 additional bonus point), you should try to do some blending at the border between green and non-green (hint: look at the webglfundamentals.org discussion of image manipulation and convolution kernels). You should sample the pixels around the target pixel in your green screened image, to see if it is at the boundary of green and non-green values, and if so set a transparency value to blend the pixels from the two images. In other words, in an area of all green, you should display the background image, and in an area of no green, you should display the green-screen image; in the boundary, you should blend between the two. The blend should be heavily biased to the background to avoid having a green halo around non-green parts of the image. 2. Brightness based blending Blend the images using the “lightness” of one image to determine the blend. The example fragment shader computes a grey-scale “lightness” value of a pixel as a weighted-average of its “`r,g,b“` components. The three components are *not* weighted equally, because the human eye is better at seeing green than red or blue. You should use the input slider to control the blend (use the 0..1 value as a threshold). You should have a small window around the threshold value through which blending occurs (e.g., if the threshold is .5, and the window is .1 wide, then values below .45 are from one image, values above .55 are from the other, and values from .45 to .55 smoothly blend from one image to the other.## Sample CodeThe sample code is similar to the code from the shader example in class, but instead of operating on a sphere, it displays a single quadrilateral (quad) for your shaders. There are four buttons on the web page that let you pick one of your shaders.You should change the text on each button to reflect what the shader effect you chose is, and add text below the canvas describingwhat each effect is.The starting shader code is in the shaders directory. We’ve provided four copies of the same simple shader, that computes diffuse lighting from a single light and combines it with a texture.There are comments at the top reminding you what uniforms and attributes three.js sets for you.## Reference material/help on Mandelbrot and Julia setsResources that might be helpful include* https://en.wikipedia.org/wiki/Mandelbrot_set* https://en.wikipedia.org/wiki/Julia_set* http://www.wikihow.com/Plot-the-Mandelbrot-Set-By-Hand# Authorship RulesThe code that you turn in should be entirely your own. You are allowed to talk to other members of the class and to the instructor and the TAs about general implementation of the assignment. It is also fine to seek the help of others for general Typescript and Web programming questions. You may not, however, use code that anyone other than yourself has written. The only exceptions are that you should use your code from Project 1A and the source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, GitHub, from books, from other students, or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TAs for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there.**Do Not Change the names** of the existing files (e.g., index.html, app.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“`2. run with “`npm run dev“`3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to GitHub, check it out again to a new directory and make sure everything builds and runs correctly.# Development EnvironmentThe development environment is the same as used in previous assignments.
The goal of this project is to write a ray tracing renderer. You will write a collection of Javascript functions that, when called, will create a 3D scene and produce 2D images of the scene. One of the functions will initialize the scene, others will create objects, lights and a virtual camera, and one additional function will determine the color of a ray cast into the scene, that will be used to render the scene into a 2D image. You will be provided with various scenes that will call your functions to test your ray tracing code.This is the first half of a two-part project. For this first part you will cast eye rays into the scene for each pixel, test these rays for intersection with sphere objects, and then use the the Lambertian shading equation (ambient + diffuse + specular) to find the color for each pixel. In the second half of the project you will expand your Ray Tracer to detect intersections between rays and disks. You will also expand your shading function to cast shadows, reflected rays, support area lights, and implement distribution raytracing. Keep this in mind when deciding on implementation details.## Due: Tuesday November 2nd, 11:59pm## RubricGraded out of 10.1. (2 points) Enough of `eyeRay(i,j)` that scene 0 (empty scene) works. Should use the values defined in `set_fov()` and `set_eye()` (TA may look at code to see if it’s plausibly correct) 2. (1 point) `reset_scene()` empties the scene, including resetting background and ambient light. `ambient_light` and `set_background()` set values to be used by ray color calculation. 3. (1 point) create scene data structure. `new_light()`, `new_sphere()`. 4. (2 point) `traceRay()` finds the closest point on one of the spheres and calls some method to compute the color for it. 5. (1 point) color for ray includes correct ambient contribution 6. (3 point) color for ray includes the correct diffuse (1 point) and specular (1 point) contribution per light (1 point for all lights).You have four primary goals for the first part of this project:1. Initialize the scene 2. Cast eye rays for each pixel 3. Implement detection of ray intersection with spheres 4. Implement the shading equationYou can accomplish these goals however you see fit. A good approach would be to use object oriented programming practices and create objects for each of the major scene components (light, ray, sphere, and later disks). Global lists of scene objects can be stored to allow for easy access. You can then create a Ray object for each pixel and write a method which will take a Ray as input and test it against the scene objects for intersections, returning the closest hit. This Hit object, containing all necessary information, can then be passed to a shading function which would implement the shading equation (for multiple light sources) and return the pixel color. Such an approach would be easy to test and to extend in the next stage of the project.However you implement the ray tracer, your results should appear exactly like the examples shown below in Results.# Provided CodeThe provided source code in `testRayTracer.ts` is structured similarly to the `app.ts` A1 sample code, creating a canvas in its constructor, setting up a `RayTracer` object defined in `rayTracer.ts`, and having routines to create various scenes using the functions you will write. Each number key 1-4 is assigned to a single scene function and pressing that key should reset the scene and create an image of the new one.The code in `rayTracer.ts` contains some code to get your started, along with empty functions used for scene setup. These are described below. You will need to implement them. Feel free to define any classes, objects, data structures, and global variables that you want to accomplish this project. (We also recommend breaking complex equations down into simple equations assigned to extra variables; aside from making the code clearer, having these values in separate variables makes debugging easier because you can look at them in the debugger.)You should modify the source code in any way you see fit, and comment your code (include placing your name in the header). The source code is written in Typescript. You are NOT allowed to use any graphics commands from libraries such as Three.js or native web libraries, all code must be your own. We are not using rasterization for this project, so you should not need any of these libraries.We have provided sample Vector and Color data types, which you can use as is, or modify as desired. And you may use the standard Javascript/Typescript math functions. When in doubt about what library functions are okay to use, please ask the instructor or a TA.We have provided a `draw_scene()` function that loops through the pixels and calls `traceRay(ray)` and `eyeRay(i,j)` you will write. `draw_scene()` draws the pixels into a canvas for you.# Scene Description LanguageEach scene is described by calling several methods on the RayTracer object that set up and render the scene.Below are the methods that you will need to implement for this assignment:#### `reset_scene ()`Initialize all the data structures and variables so you can start with an empty scene.#### `set_background (r, g, b)`Sets the background color. If a ray misses all the objects in the scene, the pixel should be given this color.#### `set_fov (angle)`Specifies the field of view (in degrees) for perspective projection. Default value 90. You will need to convert this to the focal length d. You will then use this together with the eye position and the u, v, and w vectors of the eye’s rotation matrix to create the eye rays.#### `set_eye (cx, cy, cz, lx, ly, lz, ux, uy, uz)`Specifies the eye position (cx,cy,cz) in 3D coordinates along with a lookat point (lx,ly,lz) that define the position and direction the camera is looking. An up vector (ux, uy, uz) allows you to set the full orientation of the camera. Camera is at the origin, looking down the -z axis by default.#### `new_light (r, g, b, x, y, z)`Create a point light source at position (x,y,z) and its color (r, g, b). Your code should allow at least 10 light sources. For the second part of this assignment, you will cause these lights to cast shadows.#### `ambient_light (r, g, b)`Create an “ambient” light with color (r, g, b), in order to approximate indirect illumination. There is only one ambient light; multiple calls will just replace the ambient light.#### `new_sphere (x, y, z, radius, dr, dg, db, k_ambient, k_specular, specular_power)`Specifies the creation of a sphere with its center at (x, y, z) and with a given radius. The diffuse color of the sphere is given by (dr, dg, db). The coefficient k_ambient specifies how much of the ambient light combines with the diffuse color of the surface. For this project, we will assume that all specular highlights are white, and the brightness of the highlight is given by k_specular. The tightness of the highlight is guided by specular_power.#### `draw_scene()`Ray-traces the scene and displays the image in the canvas region in your browser. We have provided this method, but you will need to implement two internal methods, `traceRay(ray)` and `eyeRay(i,j)`, that this method calls.Note on color specification: Each of the red, green, and blue components for the above commands are floating point values in the range of 0.0 to 1.0.# ResultsBelow are the images that your program should generate for the sample scenes, when you press the keys 1-4. Key 0 will draw an empty scene with just the background color. No scene is generated when the program starts, you will just see a light yellow canvas. There will be additional scene descriptions provided for Part B of this project.    The canvas is a fixed size (the number of pixels in the canvas), specified in the RayTracer object constructor. The constructor also allows you to set the number of virtual pixels that will be ray traced; these should be smaller than the number of pixels in the canvas (ideally simple fractions of them). It will draw larger “pixels” in the canvas so it is filled. Using smaller numbers of virtual pixels will make debugging faster.The sample project is set up with a 500×500 pixel canvas, but only renders 100×100 virtual pixels, resulting in 5×5 canvas pixels for each virtual ray-traced pixel. The sample scenes will look like this with these settings.   # Authorship RulesThe code that you turn in should be entirely your own. You are allowed to talk to other members of the class and to the instructor and the TA’s about general implementation of the assignment. It is also fine to seek the help of others for general Typescript and Web programming questions. You may not, however, use code that anyone other than yourself has written. The only exceptions are that you should use the source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, github, from books, from other students, or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TA’s for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there.**Do Not Change the names** of the existing files (e.g., index.html, rayTracer.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“` 2. run with “`npm run dev“` 3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to github, check it out again to a new directory and make sure everything builds and runs correctly.## Development EnvironmentThe development environment is the same as used in previous assignments.
The goal of this project is to learn the underlying techniques used in a graphics library that is similar in design to early versions of OpenGL. In particular, you will implement transformation, projection, and mapping to the screen of user-provided lines.– You have implemented the transformation matrix creation in 1a, here you will use them along with projection and drawing commands. – You will test out your code with the help of the provided routines for drawing a few simple images. – You will also create a routine that draws your initials on the screen in perspective.## Due: Wednesday September 22nd, 11:59pm## RubricGraded out of 10.1. (4 points) 2 point each for implementing `ortho()` and `perspective()` correctly, 1 point each for the projection component, 1 point each for viewport mapping to the full screen correctly. 2. (1 point) Points created by `vertex()` are correctly transformed by the transformation and projection matrices. 3. (1 point) implement `vertex()` by matrix multiplication correctly. 4. (1 point) calling the `Drawing` `line()` method with the correct transformed values. 5. (1 point) implementing `beginShape()` and `endShape()` correctly. 4. (2 point) routine to draw your initials on the screen (1 point for drawing the initials, 1 point for satisfying the perspective requirements below).As stated in the project objective, you will be implementing a collection of basic graphics library commands that are similar in style to early versions of the popular OpenGL library.The graphics library commands are implemented as methods on the `myDrawing` class in `app.ts`, a subclass of the `Drawing` class defined in `common.ts`. (We’ve moved most of the boilerplate code into a subclass for clarity; you do not have to modify `common.ts`).Part of these methods are the ones that you wrote for Project 1A. The following routines should act as they did in that earlier project:– `init()`, now called `initMatrix()` – `translate(x, y, z)` – `scale(sx, xy, sz)` – `rotateX(angle)`, `rotateY(angle)`, `rotateZ(angle)` – `print()`, now called `printMatrix()`You will also implement several new routines which allow a user to manipulate and display 3D lines.The provided source code contains empty methods for each of the following routines:### `ortho(left, right, bottom, top, near, far)`Specifies that an orthographic projection will be performed on subsequent vertices. The direction of projection is assumed to be along the z-axis. The six values passed to this routine describe a box to which all lines will be clipped. The `left` and `right` values specify the minimum and maximum x values that will be mapped to the left and right edges of the framebuffer. The `bottom` and `top` values specify the y values that map to the bottom and top edges of the framebuffer. The `near` and `far` values are used to compute the projection matrix, but do not actually clip the content. The eye point is assumed to be facing the negative z-axis.### `perspective(fov, near, far)`Specifies that a perspective projection will be performed on subsequent vertices. The center of projection is assumed to be the origin, and the viewing direction is along the negative z-axis. The value `fov` is an angle (in degrees) that describes the field of view in the `y` (vertical) direction. (Given the vertical `fov` and the width and height of the window, the horizontal field of view can be computed based such that the content retains the correct aspect ratio). The `near` and `far` values are used to compute the projection matrix, but do not actually clip the content. The eye point is assumed to be facing the negative z-axis.In OpenGL, the projection and transformation matrices are maintained separately, so you can specify projections at any time before you draw lines and polygons. We will do the same for our assignment. Which ever projection that you specify (`ortho` or `perspective`) will be used until you call a different projection command, and it will be the last operation that is applied to the line endpoints, regardless of where those procedure calls appear with respect to the other transformations.### `beginShape()`, `endShape()`, `vertex(x, y, z)`The `beginShape()` and `endShape()` commands signal the start and end of a list of endpoints for line segments that are to be drawn. Each call to the routine `vertex()` between these two commands specifies a 3D vertex that is a line endpoint. Black lines are drawn between successive odd/even pairs of these vertices. If, for example, the four vertices v1, v2, v3, v4 are given in four sequential `vertex` commands then two line segments will be drawn, one between v1 and v2 and another between v3 and v4.The vertices of the lines are first modified by the current transformation matrix, and then by which ever projection was most recently described (`ortho` or `perspective`). Only one of `ortho` or `perspective` is in effect at any one time. These projections do not affect the current transformation matrix, nor are they affected by the `initMatrix` command, and should be maintained as a separate matrix! Your `beginShape`, `vertex` and `endShape` commands must be able to draw any number of lines. You can draw the lines as soon as both vertices are given to you (using `vertex`), or you can store all of the vertices and draw all of the lines when `endShape` is called. Which way you use is up to you. To draw the lines, use the `line()` method on the `Drawing` object.# Provided CodeThe provided source code contains various routines to draw a few simple images. Upon running the program, these supplied set of routines can be called with the number keys (0-9). Make sure your browser window is the active window when you type any of these digits. You will use these test cases to debug and validate your library routines.Pressing the ‘1’ key will call the first test case to draw a square on the screen. This test function should be used to test out your `ortho` routine. Pressing keys ‘2’ and higher will test out more of the routines that you should provide.The ‘0’ key will run a routine called `persp_initials()`. In this routine, defined at the bottom of the `projection_tests.ts` file, you should write a set of commands to draw your initials on the screen, using the commands that you have created (`perspective()`, `beginShape()`, `vertex()`, and so on). That is, you should draw the first letters of your personal and family name. You must use perspective projection when viewing your initials, and you also must make it obvious that they are being seen in perspective. In the example below, the initials GT (Georgia Tech) are drawn. You can be as simple or as elaborate as you like in drawing your initials, so long as these letters can be correctly identified.You should modify the source code in the `app.ts` and `projection_tests.ts` files, and include your name in the header. The source code is written in Typescript, as in previous assignments. Please note that you are not allowed to use the transformation functions on the canvas rendering context in accomplish the tasks for this project. The only drawing routine that you should use in your code is the `line()` command we have provided.When in doubt about something, ask the instructor or the TA’s.Our textbook demonstrates doing the steps required to create a projection as a series of matrices multiplied together, including a viewing transformation that takes the canonical view volume (`(-1,-1,-1)` to `(1,1,1)`) into 2D window coordinates. Following this approach, and using your existing matrix multiplication routine, you will want to create a projection matrix that you can multiply the vertices by after you have transformed them with the current transformation matrix created in 1 A.# ResultsBelow are the results that your program should draw when you press the 1-9 and the 0 keys. As with 1 A, we are providing these tests to help you check that your implementation is correct. Note that you will use your own initials for the final image, and you must clearly demonstrate that this image was created using perspective projection.The results shown below were rendered in a square window. If the window is not square, the orthographic projection will stretch the content to fill the window, while the perspective projection should keep the correct aspect ratio (squares will remain square) and window height, but show more/less of the horizontal view depending on if the window is wider (more) or narrower (less) than the height.# Authorship RulesThe code that you turn in should be entirely your own. You are allowed to talk to other members of the class and to the instructor and the TA’s about general implementation of the assignment. It is also fine to seek the help of others for general Typescript and Web programming questions. You may not, however, use code that anyone other than yourself has written. The only exceptions are that you should use your code from Project 1A and the source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, github, from books, from other students, or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TA’s for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there. All of your code should be in the file `app.ts` and `projection_tests.h`. Do not add extra files.**Do Not Change the names** of the existing files (e.g., index.html, app.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“` 2. run with “`npm run dev“` 3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to github, check it out again to a new directory and make sure everything builds and runs correctly.## Development EnvironmentThe development environment is the same as used in previous assignments.
This project is designed to familiarize you with the basics of creating 3D transformation matrices. You will also use this project as the basis for the second part of Project 1.## Due: Saturday September 11st, 11:59pm## RubricGraded out of 10.1. (3 points) 1 point each for implementing `init()`, `print()`, and `currentMatrix()` correctly 2. (5 points) 1 point each for creating the correct transformation matrix in each of `translate(x,y,z)`, `scale(x,y,z)`, `rotateX(angle)`, `rotateY(angle)`, and `rotateX(angle)`. 3. (1 point) implement matrix multiplication correctly. 4. (1 point) correctly multiply the transformation matrices in each of `translate(x,y,z)`, `scale(x,y,z)`, `rotateX(angle)`, `rotateY(angle)`, and `rotateX(angle)` with the `ctm`, updating the `ctm`.In order to familiarize you with matrix transformations and operations, you will be completing the empty methods that are modeled loosely on the similar commands from traditional graphics libraries like OpenGL. Such libraries had the notion of a “current transformation matrix” (`ctm`) that you would operate on with a series of commands.You will write code to do the following in `matrix.ts`:1. Write the code for `init()` that initializes the `ctm` to the 4×4 identity matrix.2. Write code in `print()` to print the `ctm` to the browser console using [console.log](https://developer.mozilla.org/en-US/docs/Web/API/Console/log). Make sure your matrix prints on four separate rows. Also, to make your output easier to read, have this command print an empty line after the last row. Example output from print(): “` 1, 0, 0, 0 0, 1, 0, 0 0, 0, 1, 0 0, 0, 0, 1“`3. Write code for `currentMatrix()` to return the `ctm` as an array of 16 numbers in row major order (i.e., elements 0..3 are row 1 of `ctm`, 4..7 are row2 of `ctm`, 8..11 are row3 of `ctm`, and 12..15 are row4 of `ctm`).3. Create an internal routine (i.e., one that is not exported from `matrix.ts`) that does 4×4 matrix multiplication, to be used in the next step.4. Create 4×4 scale, translate, and simple rotation matrices, and multiply `ctm` with this newly-created matrix. For rotations, the angles that are passed into these routines will be in degrees. The names of the commands that you will implement are `scale(x,y,z)`, `translate(x,y,z)`, `rotateX(angle)`, `rotateY(angle)`, and `rotateZ(angle)`. For instance, `scale` will cause the following change to the current transformation matrix: `new_ctm = old_ctm * scale_matrix`. Note that multiplication of the new transformation on the right of `ctm` matches OpenGL’s policy of having the last transformation that you specify affect an object’s vertices first.# Carrying Out the AssignmentThe provided source code gives you empty methods for each of the operations listed above in `matrix.ts`. The provided code also tests these matrix commands, calling various commands and then printing the current transformation matrix to show the result. These tests are in the routine `mat_test()` in `matrix_test.ts`. You will need to open the console in browser window to view the printed results. See below for output from this test.You should modify the source code in any way you see fit to satisfy the requirements. Feel free to define classes, create helper routines, and use variables within the module. Resist the temptation to over-engineer the solution, however; your code does not need to generalize to other use cases or matrix sizes. You should also comment your code and include your name in the header. The source code is written in Typescript.You should create your own 4×4 matrix data type and your own 4×4 matrix multiplication routine. To make sure that all students are limited to using the same resources, you may NOT use external JavaScript libraries, including libraries that define and/or manipulate matrices, vectors, lists, or that do formatted printing. When in doubt about what routines you may use, ask the instructor.# Authorship RulesThe code that you turn in is entirely your own. You are allowed to talk to other members of the class and to the instructor and the TA’s about general implementation of the assignment. It is also fine to seek the help of others for general Typescript programming questions. You may not, however, use code that anyone other than yourself has written. The only exception is that you should start with the source code that we provide for this project. Code that is explicitly not allowed includes code taken from the Web, GitHub, from books, from other assignments or from any source other than yourself. You should not show your code to other students. Feel free to seek the help of the instructor and the TAs for suggestions about debugging your code.# SubmissionYou will check out the project from GitHub Classroom, and submit it there. All of your code should be in the file `matrix.ts`. Do not add extra files.**Do Not Change the names** of the existing files (e.g., index.html, app.ts, etc). The TAs need to be able to test your program as follows:1. cd into the directory and run “`npm install“` 2. run with “`npm run dev“` 3. visit “`http://localhost:3000/index.html“`Please test that your submission meets these requirements. For example, after you check in your final version of the assignment to github, check it out again to a new directory and make sure everything builds and runs correctly.## Development EnvironmentThe development environment is the same as used in A0 and in the Ex1 sample.## Results from `mat_test()`Below are the results that your code should produce when the test routine is run. Keep in mind that your values and these might differ very sightly due to numerical precision issues, and this is fine. In particular, some of the rotation results may produce numbers that are very small but not quite zero.“` init() 1, 0, 0, 0 0, 1, 0, 0 0, 0, 1, 0 0, 0, 0, 1translate (3, 2, 1.5) 1, 0, 0, 3 0, 1, 0, 2 0, 0, 1, 1.5 0, 0, 0, 1scale (2, 3, 4) 2, 0, 0, 0 0, 3, 0, 0 0, 0, 4, 0 0, 0, 0, 1rotateX (90) 1, 0, 0, 0 0, 6.123233995736766e-17, -1, 0 0, 1, 6.123233995736766e-17, 0 0, 0, 0, 1rotateY (-15) 0.9659258262890683, 0, -0.25881904510252074, 0 0, 1, 0, 0 0.25881904510252074, 0, 0.9659258262890683, 0 0, 0, 0, 1rotateZ (45) and init() 0.7071067811865476, -0.7071067811865475, 0, 0 0.7071067811865475, 0.7071067811865476, 0, 0 0, 0, 1, 0 0, 0, 0, 11, 0, 0, 0 0, 1, 0, 0 0, 0, 1, 0 0, 0, 0, 1translate (1.5, 2.5, 3.5) and scale (2, 2, 2) 2, 0, 0, 1.5 0, 2, 0, 2.5 0, 0, 2, 3.5 0, 0, 0, 1currentMatrix() = 2,0,0,1.5,0,2,0,2.5,0,0,2,3.5,0,0,0,1scale (4, 2, 0.5) and translate (2, -2, 10) 4, 0, 0, 8 0, 2, 0, -4 0, 0, 0.5, 5 0, 0, 0, 1scale (2, 2, 2), print, init(), and translate (3.14159, 2.71828, 1.61803) 2, 0, 0, 0 0, 2, 0, 0 0, 0, 2, 0 0, 0, 0, 11, 0, 0, 3.14159 0, 1, 0, 2.71828 0, 0, 1, 1.61803 0, 0, 0, 1currentMatrix() = 1,0,0,3.142,0,1,0,2.718,0,0,1,1.618,0,0,0,1Multiple mat.scales and mat.translates 1, 0, 0, 0 0, 1, 0, 0 0, 0, 1, 0 0, 0, 0, 11, 0, 0, 0 0, 1, 0, 0 0, 0, 1, 0 0, 0, 0, 1 “`