Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Cs6250 project 4- spanning tree

CS 6250 Spring 2025Spanning Tree Table of Contents PROJECT GOAL 2 Part 1: Setup 2 Part 2: Files Layout 2 Part 3: TODOs 3 Part 4: Testing and Debugging 5 Part 5: Assumptions and Clarifications 6 What to Turn In 7 What you can and cannot share 7 Rubric 8PROJECT GOAL Part 1: Setup Download the project files from Canvas. You can do this project on your host system if it has Python 3.11.x. The project does not have any dependencies outside of Python. You must be sure that your submission runs properly in Gradescope. Gradescope is the environment where your project will be graded. Gradescope and the VM are the only valid environments for this course. Part 2: Files Layout There are many files in the SpanningTree directory, but you should only modify Switch.py. The files in the project skeleton are described below. DO NOT modify these files. All of your work must be in Switch.py ONLY. You should study the other files to understand the project framework. • Topology.py – Represents a network topology of layer 2 switches. This class reads in the specified topology and arranges it into a data structure that your Switch can access. This class also adjusts the topology if any changes are indicated within the XXXTopo.py class. • Message.py – This class represents a message format you will use to communicate between switches, similar to the course lectures. Specifically, you will create and send messages in Switch.py by declaring a message as: msg = Message(claimedRoot, distanceToRoot, originID, destinationID, pathThrough, timeToLive)• run.py – A “main” file that loads a topology file (see XXXTopo.py below), uses that to create a Topology object containing Switches, and runs the simulation. • XXXTopo.py, etc. – These are topology files that you will pass as input to the run.py file. Part 3: TODOs This is an outline of the code you must implement in Switch.py with suggestions for implementation. Keep in mind that certain update rules will take precedence over others. A. Decide on the data structure(s) that you will use to keep track of the spanning tree.1. The collection of active links across all switches is the resulting spanning tree. 3. This is a distributed algorithm. The switch can only communicate with its direct neighbors. It does not have an overall view of the topology as a whole (do not access self.topology). 4. An example data structure should include, at a minimum: a. a variable to store the switch ID that this switch sees as the root, b. a variable to store the distance to the switch’s root, c. a list or other datatype that stores the “active links” (only the links to neighbors that are in the spanning tree). d. a variable to keep track of which neighbor it goes through to get to the root (a switch should only go through one neighbor, if any, to get to the root).B. Implement processing a message from an immediate neighbor.1. You do not need to worry about sending the initial messages. You only need to worry about the sending and processing of subsequent messages. 2. For each message a switch receives, the switch will need to: a. Determine whether an update to the switch’s root information is necessary and update accordingly. I. The switch should update the root stored in its data structure if it receives a message with a lower claimedRoot. II. The switch should update the distance stored in its data structure if a) the switch updates the root, or b) there is a shorter path to the same root. b. Determine whether an update to the switch’s active links data structure is necessary and update accordingly. The switch should update the activeLinks if: I. The switch finds a new path to the root (through a different neighbor). In this case, the switch should add the new link to activeLinks and removes the old link from activeLinks II. The switch receives a message with pathThrough = TRUE but does not have that originID in its activeLinks list. In this case, the switch should add originID to its activeLinks list. III. The switch receives a message with pathThrough = FALSE but the switch has that originID in its activeLinks. In this case, the switch should remove originID from its activeLinks list c. Determine when the Switch should send messages to its neighbors and send the messages. I. The message FIFO queue is maintained in Topology.py. The switch implementation does not interact with the FIFO queue directly, but uses the send_message function, and receives messages as arguments in the process_message function. II. When sending messages, pathThrough should only be TRUE if the destinationID switch is the neighbor that the originID switch goes through to get to the claimedRoot. Otherwise, pathThrough should be FALSE. III. The switch should continue sending messages to its neighbors until the ttl (time to live) on the Message being processed is 0. You need to decrement the ttl every time you process a Message. Note: This is one place where this project deviates from the STP algorithm you learned in the lectures. a. The switch that is dropped should never split the original topology. That means that the final Topology will remain connected. b. The switch that is dropped could be the original root, your algorithm should adapt accordingly. c. The Topology file will include the ttl_limit and drops. The ttl_limit is the starting ttl for each message in the Topology. The drops indicate which switch(es) will be dropped. d. You do not need to access the ttl_limit. This will be given to each message at the start of the process. You need to decrement the ttl to 0 to trigger the Topology’s drop process. C. Write a logging function. 1. The switch should only output the links that are in the spanning tree. 2. Follow the below format (# – #). Unsorted or non-standard formatting will result in penalties. Examples of correct logs with the correct format have been provided to you in the project directories. 3. Sorted: Not sorted: 1 – 2, 1 – 3 1 – 3, 1 – 2 2 – 1, 2 – 4 2 – 4, 2 – 1 3 – 1 3 – 1 4 – 2 4 – 2Part 4: Testing and Debugging To run your code on a specific topology (SimpleLoopTopo.py in this case) and output the results to a text file (out.txt in this case), execute the following command: python run.py SimpleLoopTopo “SimpleLoopTopo” is not a typo in the example command – don’t include the .py extension. We have included several topologies with correct solutions for you to test your code against. You can (and are encouraged to) create more topologies and test suites with output files and share them on Ed Discussion. There will be a designated post where students can share these files. You will only be submitting Switch.py – your implementation must be confined to modifications of that file. We recommend testing your submission against a clean copy of the rest of the project files prior to submission. Part 5: Assumptions and Clarifications A. All switch IDs are positive integers, and distinct. 1. These integers do not have to be consecutive. 2. They will not always start at 1. 3. There is no maximum value beyond language (Python) limitations (but your code does not need to check for this). B. Tie breakers: If there are multiple paths of equal distance to the same root, the switch should choose the path through the neighbor with the lowest switch ID. 1. Example: switch 5 has two paths to root switch 1, through switch 3 and switch 2. Each path is 2 hops in length. Switch 5 should select switch 2 as the path to the root and disable forwarding on the link to switch 3.C. There is a single distinct solution spanning tree for each topology. This is guaranteed by the first two assumptions (A and B). D. All switches in the network will be connected to at least one other switch, and all switches are able to reach every other switch. It will always be possible to form a tree that spans the entire topology. E. There will be only 1 link between each pair of directly connected switches. You do not need to consider how STP would behave with redundant links. G. The solution implemented in Switch.py should terminate without intervention. When there are no more messages in the queue to process, the simulation will log the output and terminate. Your algorithm should stop sending messages when the ttl on the Message being processed is 0. H. Your solution should not require any outside Python modules. Do not import any other modules. What to Turn In Before submission: a. Make sure your logging format is correct. Invalid format will be marked as incorrect. b. Remove all print statements from your code before turning it in. Print statements can have drastic effects on runtime. Your submission must take less than 30 seconds per topology. If print statements in your code adversely affect the grading process, your work will not receive full credit. c. Your algorithm must converge upon the Spanning Tree within the Topology’s ttl_limit. d. Make sure your Switch.py works in Gradescope. Gradescope will give you immediate feedback, along with your grade, so we will not accept re-grade requests related to incorrect submissions. f. Helper functions: Helper functions are fine as long as the names don’t conflict with anything already in the project. If it works in Gradescope, it is fine. After submission: h. Your grade in Gradescope will be your grade for this project, with some caveats: b. Any attempt to bypass or distort the autograder will result in a 0 and will be referred to OSI. What you can and cannot share Rubric10 pts Correct Submission For turning in the correct file with the correct name. You receive 10 FREE points for reading the instructions. 30 pts Provided Topologies For correct Spanning Tree results (log files) on the provided topologies. 60 pts Hidden Topologies For correct Spanning Tree results (log files) on the four topologies that you will not have access to. These cases are used to prevent students from hard coding a solution.

$25.00 View

[SOLVED] Cs6250 project 5- sdn firewall with pox

Summer 2025SDN Firewall with POX Table of ContentsSDN Firewall with POX Project 2 Part 0: Project References 2 Part 1: Files Layout 3 Part 2: Mininet 4 Part 3: Wireshark 4 Part 4: SDN Firewall Implementation Details 7 Part 4a: Specifications of configure.pol 7 Part 4b: Implementing the Firewall in Code 9 Part 5: Configuration Rules 10 What to Turn In 11 What you can and cannot share 12 Appendix A: How to Test Host Connectivity 13 Part A: How to Test Manually 13 Part B: Automated Testing Suite 16 Appendix B: Troubleshooting Information 17 General Coding Issues 17 Firewall Implementation (sdn-firewall.py) Errors and Issues 17 Mininet/Topology Issues 17 Appendix C: POX API Excerpt 18 Flow Modification Object 18 Match Structure 18 OpenFlow Actions 20 Example: Sending a FlowMod Object 21 Appendix D: Review of Mininet 23SDN Firewall with POX ProjectIn this project, you will use Software Defined Networking (SDN) principles to create a configurable firewall using an OpenFlow enabled Switch. The Software Defined Networking (OpenFlow) functionality allows you to programmatically control the flow of traffic on the network. This project has two phases (and one optional phase) as follows: 2. Wireshark Tutorial – This phase is a brief introduction to packet capture using Wireshark/tshark. You will examine the packet format for various traffic to learn of the different header values used in Phase 3. There is a deliverable of a simple packet capture file. 3. SDN Firewall – This phase involves completing code to build a simple traffic blocking firewall using OpenFlow with the POX Controller based on rules passed to it from a configuration file. In addition, you will create a set of rules to test the firewall implementation. Part 0: Project ReferencesYou will find the following resources useful in completing this project. It is recommended that you review these resources before starting the project. • IP Header Format – https://erg.abdn.ac.uk/users/gorry/course/inet-pages/ip-packet.html • TCP Packet Header Format – https://en.wikipedia.org/wiki/Transmission_Control_Protocol • UDP Packet Header Format – https://en.wikipedia.org/wiki/User_Datagram_Protocol • The ICMP Protocol – https://en.wikipedia.org/wiki/Internet_Control_Message_Protocol • IP Protocols – https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers • TCP and UDP Service and Port References – https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers • Wireshark – https://www.wireshark.org/docs/wsug_html/ • CIDR Calculator – https://account.arin.net/public/cidrCalculator • CIDR – https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing There are a few videos describing various aspects of the project: • Project Description – https://youtu.be/Kl4nRgoeLxw • Wireshark Tutorial – https://youtu.be/AnTi1m0imVk • IP Network Address / Subnets / CIDR – See Edstem • How to Manually Test – https://youtu.be/dj323mdA3sgPart 1: Files LayoutUnzip the SDNFirewall-Summer2025zip file into your Virtual Machine. You can extract to any folder on your system. Do this by running the following command:unzip SDNFirewall-Summer2025.zip This will extract the files for this project into a directory named SDNFirewall at your current path (it is recommended that your use the mininet root directory to aid in troubleshooting (cd ~ ). The following files will be extracted: • cleanup.sh – this file called by using following command line: ./cleanup.sh This file will clean up the Mininet Environment and kill all zombie Python and POX processes. • sdn-topology.py – this file creates the Mininet topology used in this assignment. This is like what you created in the Simulating Networks project. When evaluating your code against the ruleset specified in this project, do not change it. However, you are encouraged to make your own topologies (and rules) to test the firewall. Look at the start-topology.sh file to see how to start a different topology. • ws-topology.py – this file is substantially like sdn-topology, but it does not call the POX Controller. You will use this during the Wireshark exercise. • setup-firewall.py – this file sets up the frameworks used in this project. DO NOT MODIFY THIS FILE. This file will create the appropriate POX framework and then integrates the rules implemented in sdnfirewall.py into the OpenFlow engine. It will also read in the values from the configure.pol file and validate that the entries are valid. If you make changes to this file, the autograder will likely have issues with your final code as the autograder uses the unaltered distribution version of this file. • start-firewall.sh – this is the shell script that starts the firewall. This file must be started before the topology is started. It will copy files to the appropriate directory and then start the POX OpenFlow controller. This file is called by using following command line: ./start-firewall.sh • start-topology.sh – this is the shell script that starts the Mininet topology used in the assignment. All it does is call the sdn-topology.py file with superuser permissions. This file is called by using following command line: ./start-topology.sh • test-client.py – this is a python test client program used to test your firewall. This file is called using the following command line: python test-client.py PROTO SERVERIP PORT SOURCEPORT where PROTO is either T for TCP, U for UDP, or G for GRE, SERVERIP is the IP address of the server (destination), PORT is the destination port, and optionally SOURCEPORT allows you to configure the source port that you are using. Example: python test-client.py T 10.0.1.1 80 • test-server.py – this is a python test server program used to test your firewall. This file is called using the following command line: python test-server.py PROTO SERVERIP PORT where PROTO is either T for TCP, U for UDP, G for GRE, SERVERIP is the IP address of the server (the server you are running this script on), and PORT is the service port. Example: python test-server.py T 10.0.1.1 80 Project Deliverables • configure.pol – this file is where you will supply the configuration to the firewall that specifies the traffic that should either be blocked or allowed (override blocks). The format of this file will be specified later in this document. This file is one of the deliverables that must be included in your ZIP submission to Canvas. sdn-firewall.py • –This file implements the firewall using POX and OpenFlow functions. It receives a copy of the contents of the configure.pol file as a python list containing a dictionary for each rule and you will need to implement the code necessary to process these items into POX policies to create the firewall. This file is one of the deliverables that must be included in your ZIP submission to Canvas. • packetcapture.pcap – This will be the packet capture completed in Part 4. This file is one of the deliverables that must be included in your ZIP submission to Canvas. Part 2: MininetPart 3: WiresharkWireshark is a network packet capture program that will allow you to capture a stream of network packets and examine them. Wireshark is used extensively to troubleshoot computer networks and in the field of information security. We will be using Wireshark to examine the packet headers to learn how to use this information to match traffic that will be affected by the firewall we are constructing. tshark is a command line version of Wireshark that we will be using to capture the packets between mininet hosts and we will use Wireshark for the GUI to examine these packets. However, you will be allowed to use the Wireshark GUI if you would like in doing the packet capture. Please watch the Wireshark Tutorial Video if you would like to follow along in time for a live packet capture. • Step 1: Open up a Terminal Window and change directory to the SDNFirewall directory that was extracted in Part 1.• Step 2: The first action is to start up the Mininet topology used for the Wireshark capture exercise. This topology matches the topology that you will be using when creating and testing your firewall. To start this topology, run the following command:sudo python ws-topology.pyThis will startup a Mininet session with all hosts created. If you use sdn-topology.py, you will get a controller error. Ctrl-C and redo step 2 to get the correct topology. • Step 3: Start up two xterm windows for hosts us1 and us2. After you start each xterm window, it is recommended that you run the following command in each xterm window as you load them to avoid confusion about which xterm belongs to which host:export PS1=”hostname >” replacing hostname with the actual hostname.Type in the following commands at the Mininet prompt. This is optional, but helps with identifying which xterm window belongs to which host.us1 xterm & (then run (likewise, run export PS1=”us1 >” us2 xterm & export PS1=”us2 >” in the xterm window that pops up) in the second xterm window)• Step 4: In this step, we want to start capturing all the traffic that traverses through the ethernet port on host us1. We do this by running tshark (or alternatively, wireshark) as follows from the mininet prompt:us1 sudo tshark -w /tmp/packetcapture.pcapThis will start tshark and will output a pcap formatted file to packetcapture.pcap to the tmp directory. Note that this file is created as root, so you will need to change ownership to mininet to use it in future steps – chown mininet:mininet /tmp/packetcapture.pcapYOU WILL SUBMIT THIS FILE AS A PART OF YOUR SUBMITTAL.• Step 5: Now we need to capture some traffic. Do the following tasks in the appropriate xterm window:in us1 xterm: ping 10.0.1.2 (hit control C after a few ping requests) In us2 xterm: ping 10.0.1.1 (likewise hit control C after a few ping requests) In us1 xterm: python test-server.py T 10.0.1.1 80 In us2 xterm: python test-client.py T 10.0.1.1 80 After the connection completes, in the us1 xterm, press Control-C to kill theserver. In us1 xterm: python test-server.py U 10.0.1.1 8000 In us2 xterm: python test-client.py U 10.0.1.1 8000 In us1 xterm: press Control C to kill the server In Mininet Terminal: press Control C to stop tshark• Step 7: At the bash prompt on the main terminal, run:sudo wiresharkGo to the File => Open menu item, browse to the /tmp directory and select the pcap file that you saved using tshark.You will get a GUI that looks like the example packet capture. You will have a numbered list of all the captured packets with brief information consisting of source/destination, IP protocol, and a description of the packet. You can click on an individual packet and will get full details including the Layer 2 and Layer 3 packet headers, TCP/UDP/ICMP parameters for packets using those IP protocols, and the data contained in the packet. Example Packet Capture – Host us1 making web request to Host us2Note the highlighted fields. You will be using the information from these fields to help build your firewall implementation and ruleset. Note the separate header information for TCP. This will also be the case for UDP packets. Also, examine the three-way handshake that is used for TCP. What do you expect to find for UDP? ICMP? Example TCP Three-Way HandshakePlease examine the other packets that were captured to help you familiarize yourself with Wireshark. Part 4: SDN Firewall Implementation DetailsUsing the information that you learned above in running Wireshark, you will be creating two files – one is a firewall configuration file that will specify different header parameters to match in order to allow or block certain traffic that will define the actions of the firewall and the second is the implementation code to create the OpenFlow Flow Modification objects that will create the firewall using the parameters given in the firewall configuration file. Part 4a: Specifications of configure.polThe configure.pol file is used by the firewall implementation code to specify the rules that the firewall will use to govern a connection. You do not need to code this first, but the format of the file is important to understand as your implementation code will need to use these items. The format of the file is a collection of lines that have the proper format: Rule Number, Action, Source MAC, Destination MAC, Source IP Network Address, Destination IP Network Address, Protocol, Source Port, Destination Port, Comment/Note o Rule Number = this is a rule number to help you track a particular rule – it is not to be used at all in your firewall implementation except to help you find rules that cause an error. DO NOT USE FOR PRIORITY.o Action = Block or Allow Block rules will block traffic that matches the remaining parameters of this rule. Allow rules will override Block rules to allow specific traffic to pass through the firewall (see below for an example). The entry is a string in (Block,Allow) and is validated by the parser.o Source / Destination IP Network Address in form of xxx.xxx.xxx.xxx/xx in CIDR notation or a “-“ if you are not matching this item. IMPORTANT NOTE: THIS MUST BE A VALID NETWORK ADDRESS. Thus, if you specify a subnet mask of /24, it means the first 24 bits will be the network address and the last 8 bits will be 0. Thus, for a host 10.0.10.10/24, the NETWORK ADDRESS will be 10.0.10.0/24. If you specify 10.0.10.10/24, you will get an error. If you want to specify a single host, then your netmask will be all 1s for the 32-bits of the address (i.e., a /32). The parser for the file will validate that you have a proper IP Address given, but will not parse to see if it is a valid network address. You will get a POX error if it is not a valid Network Address.The IP address of a particular host is defined inside the sdn-topology.py file.o Protocol = integer IP protocol number per IANA (0-254) or a “-“ if you are not matching this item.. An example is ICMP is IP Protocol 1, TCP is IP Protocol 6, etc.o Source / Destination Port = if Protocol is TCP or UDP, this is the Application Port Number per IANA. For example, web traffic is generally TCP Port 80. Do not try to use port numbers to differentiate the different elements of the ICMP protocol. If you are not matching this item or are using an IP Protocol other than TCP or UDP, this field should be a “-“.o Comment/Note = this is for your use in tracking rules. Special Notes About Firewall Configurations: o Any field not being used for a match should have a ‘-‘ character as its entry. A ‘-‘ means that the item is not being used for matching traffic. It is valid for any rule element except for Action, Rulnenum, or Comment to have a ‘-‘ specified. Note that if you pass a “-“ to one of the match items in your code, you will crash POX. o All fields are passed as a string, so you must do type conversions as necessary. There is an easier way to match the world than using 0.0.0.0. Hint: Think about what a “-“ means. o When should I use MAC vs IP Addresses? You will want to interchange them in this file to test the robustness of your implementation. It is valid to specify a Source MAC address and a Destination IP Address. Example Rules (included in the project files: 1,Block,-,-,10.0.0.1/32,10.0.1.0/24,6,-,80,Block 10.0.0.1 host from accessing a web server on the 10.0.1.0/24 network 2,Allow,-,-,10.0.0.1/32,10.0.1.125/32,6,-,80,Allow 10.0.0.1 host to access a web server on 10.0.1.125 overriding rule What do these rules do? The first rule basically blocks host hq1 (IP Address 10.0.0.1/32) from accessing a web server on any host on the us network (the subnet 10.0.1.0/24 network). The web server is running on the TCP IP Protocol (6) and uses TCP Port 80. The second rule overrides the initial rule to allow hq1 (IP Address 10.0.0.1/32) to access a web server running on us5 (IP Address 10.0.1.125/32) By definition – from the sdn-topology.py file: This class defines the Mininet Topology for the network used in this project. This network consists of the following hosts/networks: Headquarters Network (hq1-hq5). Subnet 10.0.0.0/24 US Network (us1-us5). Subnet 10.0.1.0/24 India Network (in1-in5). Subnet 10.0.20.0/24 China Network (cn1-cn5). Subnet 10.0.30.0/24 UK Network (uk1-uk5). Subnet 10.0.40.0/24In Part 5, you will be given a set of firewall conditions that you will need to create the configure.pol needed for your submission. Part 4b: Implementing the Firewall in CodeAfter reviewing the format of the configure.pol file, you will now code a generic implementation of a firewall that will use the values provided from the configuration file (passed to you as dictionary items). As it is provided, the firewall implementation code blocks no traffic. You must implement code that does the following: o Create an OpenFlow Flow Modification object o Create a POX Packet Matching object that will integrate the elements from a single entry in the firewall configuration rule file (which is passed in the policy dictionary) to match the different IP and TCP/UDP headers if there is anything to match (i.e., no “-“ should be passed to the match object, nor should None be passed to a match object if a “-“ is provided). o Create a POX Output Action, if needed, to specify what to do with the traffic. You will need to rewrite the rule = None to reference your Flow Modification object. Your code will go into a section that will repeat itself for every line in the firewall configuration file that is passed to it. The “rule” item that is added to the “rules” list is an OpenFlow Modification object. The process of injecting this rule into the POX controller is handled automatically for you in the start-firewall.py file. TIP: If your implementation code segment is more than 25-30 lines, you are making it too difficult. The POX API can provide many features that are not used in this project. The Appendix provides all of the information that you will need to code the project. Key Information: o policies is a python list that contains one entry for each rule line contained in your configure.pol file. Each individual line of the configure.pol file is represented as a dictionary object named policy. This dictionary has the following keys: o policy[‘mac-src’] = Source MAC Address (00:00:00:00:00:00) or “-“ o policy[‘mac-dst’] = Destination MAC Address (00:00:00:00:00:00) ) or “-“ o policy[‘ip-src’] = Source IP Address (10.0.1.1/32) in CIDR notation ) or “-“ o policy[‘ip-dst’] = Destination IP Address (10.0.1.1/32) ) or “-“ o policy[‘ipprotocol’] = IP Protocol (6 for TCP) ) or “-“ o policy[‘port-src’] = Source Port for TCP/UDP (12000) ) or “-“ o policy[‘port-dst’] = Destination Port for TCP/UDP (80) ) or “-“ o policy[‘rulenum’] = Rule Number (1) o policy[‘comment’] = Comment (Example Rule) o policy[‘action’] = Allow or Block o You will need to assume that all traffic is IPV4. It is acceptable to hardcode this value. Do not hardcode other values. Your code should be generic enough to handle any possible configuration. DO NOT USE IpAddr() IN YOUR IMPLEMENTATION. DO NOT USE 0.0.0.0/0 TO DENOTE THE WORLD IN YOUR CONFIGURE.POL. oo Hints: o The difference between an Allow or a Block is dependent on an Action and the Priority. o You don’t necessarily need an action. See Appendix C for a discussion of what happens to a packet after it is matched. o There should be two priorities – one for ALLOW and one for BLOCK. Separate them sufficiently to override any exact matching behavior that the POX controller implements). It is suggested one priority be 0 or 1 and the other one above 10000. The reasoning for this is discussed in Appendix C.Part 5: Configuration RulesDO NOT block all traffic by default and only allow traffic specified. You will lose many points because the firewall is open by default and only blocks the traffic that is specified.Firewall Rules for Summer 2025:• Task 1: Host cn4 has a TCP-based worm virus. Block cn4 from initiating network communications to any host on the internet (world) over the TCP Internet Protocol. You need not block ICMP or UDP. (one rule max) • Task 2: Host cn5 has had a security incident and needs to be completely isolated from the network so it has no connectivity (incoming or outgoing) to any other host on the internet (world). (two rules max) • Task 3: Allow the hosts on the Headquarters network to be reachable via an ICMP ping from the world (including all but the China subnet (to avoid conflicts with Tasks 1 and 2 above). In addition, the corporate subnets should not be pingable from the internet (world). However, to satisfy the first half of this task, you must allow the Headquarters network to be able to ping the US, UK, and India subnets. Can you explain why this must happen? (six rules typically)• Task 4: Do not allow any response back from a TCP web server (http and https) from host cn3 to any other host on the internet (world). (two rules max) • Task 5: (CIDR Notation Rule) The servers located on hosts us3 and us4 run a micro webservice on TCP Port 9250 9520 that processes financial information. Access to this service should be blocked from hosts uk2, uk3, uk4, uk5, in4, in5, us5, and hq5. Please use the minimal CIDR notation that will bracket the subset of hosts for each rule (it should NOT be broader than /28). (four rules typical) • Task 6: A rogue Raspberry Pi has been found on the network that has cloned the Network Address of host us1. Block this device from accessing any other hosts on the internet (world) on the UDP Internet Protocol. (one rule max)What to Turn In You need to submit your copy of packetcapture.pcap, sdn-firewall.py and configure.pol from your project directory using the instructions from the Piazza Post “How to Submit / Zip Our Projects. To recap, zip up the two files using the following command, replacing gtlogin with your GT Login that you use to log into Canvas:zip gtlogin_sdn.zip packetcapture.pcap configure.pol sdn-firewall.py The key to properly zipping the project is to NOT zip up the directory. ZIP only the files you are included.Please check your submission after uploading. As usual, we do not accept resubmissions past the stated deadlines. What you can and cannot shareRubric • 5 points for submitting a version of sdn-firewall.py that indicates effort was done. • 5 points for submitting a version of configure.pol that indicates effort was done. • 15 points for submitting a version of packetcapture.pcap that indicates effort was done. • 25 points for testing your configure.pol file with a known-good implementation. • 25 points for testing your configure.pol with your implementation. • 25 points for testing your implementation with a known-good configure.pol. Appendix A: How to Test Host ConnectivityPart A: How to Test Manually When you are developing your implementation or troubleshooting a firewall rule, you will want to test by hand. Unfortunately this process is a bit difficult.1,Block,-,-,10.0.0.1/32,10.0.1.0/24,6,-,80,Block 10.0.0.1 from accessing a web server on the 10.0.1.0/32 network Startup Procedure: o Step 1: Open two terminal windows or tabs on the VM and change to the SDNFirewall directory.o Step 2: In the first terminal window, type in: ./start-firewall.sh configure.polIf you get the following error, run chmod +x start-firewall.sh and chmod +x start-topology.shThis should start up POX, read in your rules, and start up an OpenFlow Controller. You will see something like this in your terminal window:“Added Rule” lines until after you complete Step 3 below.o Step 3: In the second terminal window, type in: ./start-topology.shThis should start up mininet and load the topology. You should see the following:This will start the firewall and set the topology. You do not need to repeat Steps 1-3 unless you are done testing, need to restart the firewall, or need to restart mininet. When you are done with testing all of the rules you intend to use, type in “quit” in the mininet window, close all of the extraneous xterm windows generated, and run the mininet cleanup script ./cleanup.sh How to test connectivity between two hosts: o Step 1: To test the rule shown above, we want to use host us1 as server/destination and host hq1 as the client. The rule we are testing involves the hq1 host attempting to connect to the web server port (TCP Port 80) on host us1. At the mininet prompt, type in the following two commands on two different lines:hq1 xterm & us1 xterm &Two windows should have popped up. You can always identify which xterm is which by running the command: ip address from the bash shell. This will give you the IP address for the xterm window, which will then let you discover which xterm window belongs to which host.o Step 2: In the xterm window for us1 (which is the destination host of the rule – remember that the destination is always the server), type in the following command:python test-server.py T 10.0.1.1 80This sets up the test server for us1 that will be listening on TCP port 80. The IP Address specified is always the IP address of the machine you are running it on. If you attempt to start the test-server on a machine that does not have the IP address that is specified in the command, you will get the following error: OSError: [Errno 99] Cannot assign requested address.o Step 6: In the xterm window for hq1 (which is the source host of the rule – remember that the source is always the client), type in the following command:python test-client.py T 10.0.1.1 80This will start up a client that will connect to the TCP Port 80 on the server 10.0.1.1 (destination IP address) and will send a message string to the server. However, if the firewall is set to block this connection, you will never see the message pass on either of the client or the server. Examples of Connection Status: • The two windows below depict a successful un-blocked connection between the client and the server.• A timed out connection is shown below. The difference between a timed-out connection on how the connection was blocked or if it was blocked on a different side of the connection.• If you get an error that says “No route to destination”, you have blocked the routing protocol. Ensure that you do not have a Unspecified Prerequisite Error Part B: Automated Testing Suite How to test normal cases:1. Change to the test-scripts directory 2. Copy your `sdn-firewall.py` and `configure.pol` into this directory. 3. Run ./start-firewall.sh configure.pol as usual. 4. Open a new window, run sudo python test_all.py. 5. Total passed cases are calculated. Wrong cases will be displayed. For example, `2: us1 -> hq1 with U at 53, should be True, current False` means the connection from client us1 to host hq1 using UDP at hq1 53 port is failed, which should be successful. The first number is the index (0-based) of testcases. True indicates that a connection was made or was expected. False indicates the opposite condition.How to test alternte cases:1. Change to the test-scripts directory 2. Copy your `sdn-firewall.py` file into the alt folder. Change to the alt directory. 3. Run ./start-firewall.sh (you do not need to specify your configure.pol file 4. Open a new window, run sudo python test_all.py in the test-suite/alt folder. 5. Total passed cases are calculated. Wrong cases will be displayed. For example, `2: us1 -> hq1 with U at 53, should be True, current False` means the connection from client us1 to host hq1 using UDP at hq1 53 port is failed, which should be successful. The first number is the index (0-based) of testcases. True indicates that a connection was made or was expected. False indicates the opposite condition. Appendix B: Troubleshooting InformationGeneral Coding Issues o Watch for type mismatches. o Do not run “pip3 install pox”. The pox module installed by pip is not the library used in this project. o You do NOT need to reparse or revalidate any of the data provided in the dictionary other than possibly changing the type from strings. o If you use Visual Studio code, add the following to your workspace settings: “python.autoComplete.extraPaths”: [ “/home/mininet/pox/” ], Also, with Visual Studio code, it sometimes “recommends” _dl_type and other names prepended with a _. Note that this is incorrect – the name is dl_type, not _dl_type. Firewall Implementation (sdn-firewall.py) Errors and Issues o If you get a struct.pack or struct.unpack error message, take a look at https://github.com/att/pox/blob/7f76c9e3c9bc999fcc97961d408ab0b71cbc186d/pox/OpenFlo w/libOpenFlow_01.py for more information. Also, the struct.pack error might reference how to fix (i.e., not an integer, EthAddr(), etc). o The following error means that you should check your output action: “TypeError: ord() expected string of length 1, but int found” Mininet/Topology Issues o On the topology terminal window, if you get an error message that states “Unable to contact Remote Controller”, it means that the POX controller had crashed and normally shows that there is a bug in your implementation code. Look at the windows where you started the firewall. o If you get the following error message, please run the cleanup.sh utility: “Exception: Error creating interface pair (s1-eth0,hq1-eth0): RTNETLINK answers: File exists”Appendix C: POX API ExcerptExcerpted and modified from: https://noxrepo.github.io/pox-doc/html/ Object Definitions: Flow Modification ObjectThe main object used for this project is a “Flow Modification” object. This adds a rule to the OpenFlow controller that will affect a modification to the traffic flow based on priority, packet characteristic matchin, and an action that will be done to the traffic that is matched. IF AN OBJECT is matched, it is pulled from the network stream and will only be forwarded, modified, or redirected if you do an action. If you do not specify an action and the packet is matched, the packet will basically be dropped.The following class descriptor describes the contents of a flow modification object. You need to define the match, priority, and actions for the object. class ofp_flow_mod (ofp_header): def __init__ (self, **kw): ofp_header.__init__(self) self.header_type = OFPT_FLOW_MOD self.match = ofp_match() self.priority = OFP_DEFAULT_PRIORITY self.actions = []Match Structure OpenFlow defines a match structure – ofp_match – which enables you to define a set of headers for packets to match against.The match structure is defined in pox/OpenFlow/libOpenFlow_01.py in class ofp_match. Its attributes are derived from the members listed in the OpenFlow specification, so refer to that for more information, though they are summarized in the table below.You should create a match object and attach it to the flow modification object.Attribute Meaning dl_src Ethernet/MAC source address (Type of EthAddr) dl_dst Ethernet/MAC destination address (Type of EthAddr) dl_type Ethertype / length (e.g. 0x0800 = IPv4) (Type of Integer) nw_proto IP protocol (e.g., 6 = TCP) or lower 8 bits of ARP opcode (Type of integer) nw_src IP source NETWORK address (Type of String) nw_dst IP destination NETWORK address (Type of String) tp_src TCP/UDP source application port (Type of Integer) tp_dst TCP/UDP destination application port (Type of Integer)matchobj = of.ofp_match(tp_src = 5, dl_type = 0x800,dl_dst = EthAddr(“01:02:03:04:05:06”)) #.. or .. matchobj = of.ofp_match() matchobj.tp_src = 5 matchobj.dl_type = 0x800 matchobj.dl_dst = EthAddr(“01:02:03:04:05:06”)IMPORTANT NOTE ABOUT IP ADDRESSESFrom Wikipedia: IP addresses are described as consisting of two groups of bits in the address: the most significant bits are the network prefix, which identifies a whole network or subnet, and the least significant set forms the host identifier, which specifies a particular interface of a host on that network. This division is used as the basis of traffic routing between IP networks and for address allocation policies. (https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing)Thus for a /24 network, the first 24 bits of the address comprises the network address. Thus, it would be 10.0.1.0. For a /25 network, there would be two networks in the 10.0.1.x space – a 10.0.1.0/25 and a 10.0.1.128/25.6.OpenFlow ActionsOutput Forward packets out of a physical or virtual port. Physical ports are referenced to by their integral value, while virtual ports have symbolic names. Physical ports should have port numbers less than 0xFF00.port (int) the output port for this packet. This is a bit misleading because it can confuse you with the application “port” for TCP/UDP. For openflow, this port represents the physical swith port that the host is plugged into. However, you do NOT know which physical port on which switch a host is connected to. Thus, you will need to use one of the virtual ports to define what you want to happen: • of.OFPP_IN_PORT – This action will send the port back to the sender (i.e., the port it came into the network on) • of.OFPP_NORMAL – Process the packet and handle via a normal L2/L3 legacy switch configuration (i.e., send traffic to its destination without modification) – See https://studyccna.com/layer-3-switch/ for information on how normal L2/L3 legacy switches work. • of.OFPP_FLOOD – This action will cause the traffic to be sent out to all ports except the source (IN_PORT) and any ports that have flooding turned off. This is very chatty and can be used to do network based attacks (see UDP Amplifications). This should be avoided. • of.OFPP_ALL – output all OpenFlow ports except the source (IN_PORT). This is the same as FLOOD but it includes ports that have had flood turned off.Think carefully about the definitions given above for output actions. Remember that if you match a packet, no action (i.e., packet will be dropped) will be done unless you set an output action as the packet is pulled from the stream until it is resolved.Example: Sending a FlowMod Object The following example describes how to create a flow modification object including matching a destination IP Address, IP Type, and Destination IP Port, and setting an action that would redirect the matching packet out to physical switch port number 4 (note that you generally DO NO KNOW what physical switch port to use.Flow Modification Objects work as thus: 1. Packet enters the system and is examined by the Flow Modification Objects (1 for each rule in your configuration ruleset) 2. The packet will then be examined to see if the different header items match the items specified for that rule. 3. If the packet matches all the applicable items, it is pulled from the stream for you to program an action for it (forward it, readdress it, change it). If you don’t do an action for it, the package is essentially dropped. If the packet does not match all the applicable header items, it continues to the next Flow Modification rule to test it. If it isn’t matched by any rules, it is passed on to the specific destination.For this project, you are making a flow modification object and action while using a matching pattern that can match any or all of the different parameters of the header. Make your implementation generic.Appendix D: Review of MininetMininet is a network simulator that allows you to explore SDN techniques by allowing you to create a network topology including virtual switches, links, hosts/nodes, and controllers. It will also allow you to set the parameters for each of these virtual devices and will allow you to simulate real-world applications on the different hosts/nodes. The following code sets up a basic Mininet topology similar to what is used for this project: #!/usr/bin/pythonfrom mininet.topo import Topo from mininet.net import Mininet from mininet.node import CPULimitedHost, RemoteController from mininet.util import custom from mininet.link import TCLink from mininet.cli import CLIclass FirewallTopo(Topo): def __init__(self, cpu=.1, bw=10, delay=None, **params): super(FirewallTopo,self).__init__()# Host in link configuration hconfig = {‘cpu’: cpu} lconfig = {‘bw’: bw, ‘delay’: delay}# Create the firewall switch s1 = self.addSwitch(‘s1’)hq1 = self.addHost(‘hq1′,ip=’10.0.0.1′,mac=’00:00:00:00:00:1e’, **hconfig) self.addLink(s1,hq1)us1 = self.addHost( ‘us1′, ip=’10.0.1.1′, mac=’00:00:00:01:00:1e’, **hconfig) self.addLink(s1,us1)This code defines the following virtual objects: • Hosts hq1 and us1 – these are individual virtual hosts that you can access via xterm and other means. You can define the IP Address, MAC/Hardware Addresses, and configuration parameters that can define cpu speed and other parameters using the hconfig dictionary. • Links between s1 and hq1 and s1 and us1 – consider these like an ethernet cable that you would run between a computer and the switch port. You can define individual port numbers on each side (i.e., port on the host and port on the virtual switch), but it is advised to let Mininet automatically wire the network. Like hosts, you can define configuration parameters to set link speed, bandwidth, and latency. REMINDER – PORTS MENTIONED IN MININET TOPOLOGIES ARE WIRING PORTS ON THE VIRTUAL SWITCH, NOT APPLICATION PORT NUMBERS. Useful Mininet Commands: • For this project, you can start Mininet and load the firewall topology by running the ./start-topology.sh from the project directory. You can quit Mininet by typing in the exit command. • After you are done running Mininet, it is recommended that you cleanup Mininet. There are two ways of doing this. The first is to run the sudo mn -c command from the terminal and the second is to use the ./cleanup.sh script provided in the project directory. Do this after every run to minimize any problems that might hang or crash Mininet. • You can use the xterm command to start an xterm window for one of the virtual hosts. This command is run from the mininet> prompt. For example, you can type in us1 xterm & to open a xterm window for the virtual host us1. The & causes the window to open and run in the background. In this project, you will run the test-*-client.py and test-*-server.py in each host to test connectivity. • The help command will show all Mininet commands and dump will show information about all hosts in the topology.

$25.00 View

[SOLVED] Cs6250 project 6- distance vector

CS 6250 Summer 2025 Distance Vector Table of Contents PROJECT GOAL 2 Part 0: Getting Started 2 Part 1: Files Layout 2 Part 2: TODOs 3 Part 3: Testing and Debugging 4 Part 4: Assumptions and Clarifications 4 Part 5: Correct Logs for Provided Topologies 6 Part 6: Spirit of the Project 7 Part 7: FAQs 8 What to Turn In 9 What you can and cannot share 9 Rubric 10 PROJECT GOAL In the lectures, you learned about Distance Vector (DV) routing protocols, one of the two classes of routing protocols. DV protocols, such as RIP, use a fully distributed algorithm to find shortest paths by solving the Bellman-Ford equation at each node. In this project, you will develop a distributed Bellman-Ford algorithm and use it to calculate routing paths in a network. This project is similar to the Spanning Tree project, except that we are solving a routing problem, not a switching problem. In “pure” distance vector routing protocols, the hop count (the number of links to be traversed) determines the distance between nodes. Some distance vector routing protocols, that operate at higher levels (like BGP), must make routing decisions based on business valuations. These protocols are sometimes referred to as Path Vector protocols. We will explore this by using weighted links (including negatively weighted links) in our network topologies. We can think of Nodes in this simulation as individual Autonomous Systems (ASes), and the weights on the links as a reflection of the business relationships between ASes. Links are directed, originating at one Node, and terminating at another. Part 0: Getting Started You should review some materials on Bellman-Ford. Some resources include: • Wikipedia (https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) • “Computer Networking: A Top-Down Approach” by Kurose and Ross o 7th edition discusses the algorithm on pages 384-385 in Chapter 5 (“The Network Layer: Control Plane”) Download and unzip the Project Files for Distance Vector from Canvas in the Assignments section. This project can be completed in the class VM or on your local machine using Python 3.10.x. You must be sure that your submission runs properly in Gradescope. Part 1: Files Layout The DistanceVector directory contains the following files: • DistanceVector.py – This is the only file you will modify. It is a specialization (subclass) of the Node class that represents a network node (i.e., router) running the Distance Vector algorithm, which you will implement. • Node.py – Represents a network node, i.e., a router. • Topology.py – Represents a network topology. It is a container class for a collection of DistanceVector Nodes and the network links between them. • run_topo.py – A simple “driver” that loads a topology file (see *Topo.txt below), uses that data to create a Topology object containing the network Nodes, and starts the simulation. • *Topo.txt – These are valid topology files that you will pass as input to the run.sh script (see below). Topologies should end with “.txt”. • BadTopo.txt – This is an invalid topology file, provided as an example of what not to do, and so you can see what the program says if you pass it a bad topology. • output_validator.py – This script can be run on the log output from the simulation to verify that the output file is formatted correctly. It does not verify that the contents are correct, only the format. • run.sh – A helper script that runs some basic system checks, the topology, and the validator, a wrapper for run_topo.py and output_validator.py . Part 2: TODOs There are a few TODOs in DistanceVector.py: A. Review the methods already implemented in Node.py. a. Because DistanceVector is a subclass of Node, consider how you might use the existing methods to complete the TODOs in this list. b. Do NOT modify Node.py. B. Decide on how each node will represent its distance vector. a. Consider what might be the simplest data structure to keep track of path weights (i.e., the distance vector). b. The distance vector variable should be local to the Node, i.e., defined in the init function as a variable accessible via the `self` object (i.e. self.mylist). C. Implement the Bellman-Ford algorithm. a. Each Node will: i. send out an initial message to its neighbors ii. process messages received from other nodes iii. send updates to other nodes as needed b. Initially, a node only knows of: i. itself and that it is reachable at cost 0, ii. its neighbors and the weights on its links to its neighbors c. NOTE: a node’s links are unidirectional. d. NOTE: The Bellman-Ford algorithm implementation should terminate naturally without external intervention. D. Write a logging function that is specific to your distance vector structure. a. You should use the self.add_entry function to help with logging. b. You should assume that the logging function only knows itself. i. Do NOT access the topology for logging; logging should happen at the Node level. Part 3: Testing and Debugging To run your algorithm on a specific topology, execute the run.sh bash script: ./run.sh *Topo Substitute the correct, desired filename for *Topo. Don’t use the .txt suffix on the command line. This will execute your implementation of the algorithm in DistanceVector.py on the topology defined in *Topo.txt and log the results (per your logging function) to *Topo.log . NOTE: You should not include the full filename of the topology when executing the run.sh script. For example, to run the algorithm on topo1.txt you should only specify topo1 as the argument to run.sh. We’ve included four good topologies for you to use in testing and one bad topology to demonstrate invalid topology. The provided topologies do not cover all the edge cases; your code will be graded against more complex topologies. Part 4: Assumptions and Clarifications A. Node behavior: i. Example: Node B has an incoming link from Node A, but has no outgoing link to Node A, Node B will send its distance vector to node A to “advertise” other nodes it can reach (Nodes C and D). b. A Node’s distance vector is comprised of the nodes it can reach via its outgoing links (including to itself at distance = 0). i. A Node will never advertise a negative distance to itself. (Important for negative cycles.) c. A Node advertises its distance vector to its upstream neighbors. d. Nodes do not implement poison-reverse. B. Edge and Path weights: b. The edge weight value type is an integer. c. There is no upper limit for path weights. d. The lower limit for path weights is “-99”, which is equivalent to “negative infinity” for this project. C. Negative cycles: a. A Node can forward traffic through a negative cycle. b. Negative cycles are a series of directed links that originate and terminate at a single node, where the sum of the link weights is less than 0. i. This can lead to a negative “count-to-infinity” problem. Therefore, your implementation must be able to detect negative cycles to terminate on its own. ii. Any node that can reach a destination node and infinitely traverse a negative cycle enroute will set the distance to that node to -99. 1. Your implementation only needs to detect and record these traversals appropriately; it does not need to mitigate them. iii. A Node can advertise a negative distance for other nodes (but not for itself). iv. A Node that receives an advertisement with a distance of -99 from a downstream neighbor should also assume that it can reach the same destination at infinitely low cost (-99). v. Example: Traffic from Node F to Node D can route through A->B->C->A indefinitely to reach an extremely low (very negative) value. c. A Node will not forward traffic destined to itself. i. Example: The below topology will not result in a count-to-infinity problem, as there are no possible pairs of source and destination nodes where traffic could indefinitely traverse a negative cycle. Node A will not forward traffic for Node A, and similarly for Nodes B and C. D. Topologies used in grading: a. We will be using many topologies to test your project. This includes but is not limited to: o topologies with and without cycles (loops), including odd length cycles o topologies of varying sizes, including topologies with more than 26 nodes o topologies with nodes with names longer than one character o topologies with multiple paths to different nodes o topologies that include any combination of positive weights, zero weight, and negative weight o topologies with Nodes that do not have incoming or outgoing links ▪ All nodes will be connected but: b. We will NOT test your submission against the following topologies (which means your algorithm does not need to account for them): o topologies with more than one link from the same origin to the same destination (multi-graphs) o topologies with portions of the network disconnected from each other (partitioned networks) o topologies that do not require intermediate steps (such as a topology with a single node) o topologies with a valid path between two indirectly linked nodes with no cycle with an actual total cost of ≤ -99 (topologies will respect that -99 is “negative infinity” for this project) Part 5: Correct Logs for Provided Topologies Below are the correct final logs for the provided topologies. We are providing them to help you identify correct behavior with respect to negative cycles and the assumptions in the instructions. We are only providing the final round; each topology should produce at least 2 rounds of output. SimpleTopo: A:(A,0) (B,1) (C,3) (D,3) B:(B,0) (A,1) (C,2) (D,2) C:(C,0) (B,2) (A,3) (D,0) D:(D,0) (C,0) (B,2) (A,3) E:(E,0) (D,-1) (C,-1) (B,1) (A,2) SingleLoopTopo: A:(A,0) (D,5) (E,6) (B,6) (C,16) B:(B,0) (A,2) (D,7) (C,10) (E,0) C:(C,0) D:(D,0) (E,1) (B,1) (A,3) (C,11) E:(E,0) (B,0) (A,2) (D,7) (C,10) SimpleNegativeCycle: AA:(AA,0) (AD,-2) (AE,-1) (AB,0) (CC,-99) AB:(AB,0) (AA,-1) (AD,-3) (CC,-99) (AE,-2) AD:(AD,0) (AE,1) (AB,2) (AA,1) (CC,-99) AE:(AE,0) (AB,1) (AA,0) (AD,-2) (CC,-99) CC:(CC,0) (AB,0) (AA,-1) (AD,-3) (AE,-2) ComplexTopo: ATT:(ATT,0) (CMCT,-99) (TWC,-99) (GSAT,-8) (UGA,-99) (VONA,-11) (VZ,-3) CMCT:(CMCT,0) (TWC,-99) (ATT,1) (VONA,-10) (GSAT,-7) (UGA,-99) (VZ,-2) DRPA:(DRPA,0) (EGLN,1) (GT,-1) (UC,-1) (CMCT,-99) (TWC,-99) (ATT,13) (OSU,-1) (VONA,2) (GSAT,5) (UGA,-99) (PTGN,1) (VZ,10) EGLN:(EGLN,0) (GT,-2) (UC,-2) (DRPA,1) (CMCT,-99) (OSU,-2) (TWC,-99) (ATT,13) (PTGN,0) (VONA,3) (GSAT,5) (UGA,-99) (VZ,11) GSAT:(GSAT,0) (VONA,-3) (VZ,5) (UGA,-99) (ATT,7) (CMCT,-99) (TWC,-99) GT:(GT,0) (UC,0) (EGLN,2) (OSU,0) (DRPA,3) (PTGN,2) (CMCT,-99) (VONA,5) (TWC,-99) (ATT,15) (VZ,13) (GSAT,7) (UGA,-99) OSU:(OSU,0) (UC,0) (GT,0) (EGLN,2) (PTGN,2) (VONA,5) (DRPA,3) (VZ,13) (GSAT,7) (CMCT,-99) (ATT,15) (UGA,-99) (TWC,-99) PTGN:(PTGN,0) (OSU,-1) (UC,-1) (GT,-1) (EGLN,1) (VONA,3) (VZ,11) (GSAT,5) (DRPA,2) (ATT,13) (UGA,-99) (CMCT,-99) (TWC,-99) TWC:(TWC,0) (CMCT,-99) (ATT,1) (VONA,-10) (VZ,-2) (GSAT,-7) (UGA,-99) UC:(UC,0) (GT,0) (EGLN,2) (OSU,0) (PTGN,2) (DRPA,3) (VONA,5) (CMCT,-99) (VZ,13) (GSAT,7) (TWC,-99) (ATT,15) (UGA,-99) UGA:(UGA,0) (ATT,50) (CMCT,-99) (TWC,-99) (GSAT,42) (VONA,39) (VZ,47) VONA:(VONA,0) (VZ,8) (GSAT,2) (ATT,10) (UGA,-99) (CMCT,-99) (TWC,-99) VZ:(VZ,0) (ATT,2) (CMCT,-99) (TWC,-99) (GSAT,-6) (UGA,-99) (VONA,-9) Part 6: Spirit of the Project The goal of this project is to implement a simplified version of a network protocol using a distributed algorithm. This means that your algorithm should be implemented at the network node level. Each network node only knows its internal state, and the information passed to it by its direct neighbors. Declaring global variables will be a violation of the spirit of the project. Part 7: FAQs A: Your solution should not require any outside Python modules. Please do not import any other modules. Q: What is the best way to format and process node messages? A: There is no right or wrong way to format messages. For best results keep things simple. Q: Is it required that the distance vectors displayed in my log files be alphabetized? A: Look at the finish_round function in Toology.py. Note how the DVs are alphabetized each round, and this is reflected in the provided correct output logs. The nodes within individual vectors are not required to be sorted. Q: Should my solution include an implementation of split horizon? A: That is not a requirement for this project. Q: What if there really is a valid path between two indirectly linked nodes with no cycle and the total cost is -99 or less? A. We will not test your submission against a topology that does this. However, from the “Assumptions and Clarifications”, note: “a Node seeing an advertised vector of -99 from a downstream neighbor can assume this means it can reach that same destination at infinitely low cost (-99).” What to Turn In To complete this project, submit ONLY your DistanceVector.py file to Gradescope as a single file. Do not modify the name of DistanceVector. You can make an unlimited number of submissions to Gradescope. Your last submission will be your grade unless you activate a different submission. There are some very important guidelines for this file you must follow: A. Ensure that your submission self-terminates. If your submission runs indefinitely (i.e., contains an infinite loop) or throws an error at runtime, it will not receive full credit. Manually killing your submission via console commands or interrupts is NOT an acceptable means of termination. B. Remove any print statements from your code before turning it in. Print statements left in the simulation, particularly for inefficient but logically sound implementations, have drastic effects on run-time. Ideally, your submission should take less than 10 seconds to process a topology. If your leave print statements in your code and they adversely affect the grading process, your work will not receive full credit. (Feel free to use print statements during the project and during debugging but remove them before you submit.) C. Ensure your logs are formatted properly. Logging is the only way that we can verify that your algorithm is running correctly. The output validator will catch most formatting mistakes, but you should inspect your output manually to make sure it matches the requested format. (See the TODO comment for logging located in DistanceVector.py for format details.) D. Ensure your solution generates completely correct output. Partial credit for individual topologies will not be awarded, even if the distance vector logs are “mostly correct.” E. Check your submission after uploading. As usual, we do not accept resubmissions past the stated deadlines. What you can and cannot share When sharing log files, leave alphabetization on so that your classmates can use the diff tool to see if you are getting the same log outputs as they are. Rubric 40 pts Provided Topologies (4 total) For correct Distance Vector results (log file) on the provided topologies. 60 pts Unannounced Topologies (4 total) For correct Distance Vector results (log file) on topologies that you will not see in advance. They are slightly more complex than the provided ones and test some edge cases. GRADING NOTE: There is no partial credit for individual topologies; each topology is either “passed” or “failed”.

$25.00 View

[SOLVED] Cs6250 project 7- distance vector

CS 6250 Spring 2025Distance Vector Table of Contents PROJECT GOAL 2 Part 0: Getting Started 2 Part 1: Files Layout 2 Part 2: TODOs 3 Part 3: Testing and Debugging 4 Part 4: Assumptions and Clarifications 4 Part 5: Correct Logs for Provided Topologies 6 Part 6: Spirit of the Project 7 Part 7: FAQs 8 What to Turn In 9 What you can and cannot share 9 Rubric 10PROJECT GOAL In the lectures, you learned about Distance Vector (DV) routing protocols, one of the two classes of routing protocols. DV protocols, such as RIP, use a fully distributed algorithm to find shortest paths by solving the Bellman-Ford equation at each node. In this project, you will develop a distributed Bellman-Ford algorithm and use it to calculate routing paths in a network. This project is similar to the Spanning Tree project, except that we are solving a routing problem, not a switching problem. In “pure” distance vector routing protocols, the hop count (the number of links to be traversed) determines the distance between nodes. Some distance vector routing protocols, that operate at higher levels (like BGP), must make routing decisions based on business valuations. These protocols are sometimes referred to as Path Vector protocols. We will explore this by using weighted links (including negatively weighted links) in our network topologies. We can think of Nodes in this simulation as individual Autonomous Systems (ASes), and the weights on the links as a reflection of the business relationships between ASes. Links are directed, originating at one Node, and terminating at another. Part 0: Getting Started You should review some materials on Bellman-Ford. Some resources include: • Wikipedia (https://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm) • “Computer Networking: A Top-Down Approach” by Kurose and Ross o 7th edition discusses the algorithm on pages 384-385 in Chapter 5 (“The Network Layer: Control Plane”) Download and unzip the Project Files for Distance Vector from Canvas in the Assignments section. This project can be completed in the class VM or on your local machine using Python 3.10.x. You must be sure that your submission runs properly in Gradescope. Part 1: Files Layout The DistanceVector directory contains the following files: • DistanceVector.py – This is the only file you will modify. It is a specialization (subclass) of the Node class that represents a network node (i.e., router) running the Distance Vector algorithm, which you will implement. • Node.py – Represents a network node, i.e., a router. • Topology.py – Represents a network topology. It is a container class for a collection of DistanceVector Nodes and the network links between them. • run_topo.py – A simple “driver” that loads a topology file (see *Topo.txt below), uses that data to create a Topology object containing the network Nodes, and starts the simulation. • *Topo.txt – These are valid topology files that you will pass as input to the run.sh script (see below). Topologies should end with “.txt”. • BadTopo.txt – This is an invalid topology file, provided as an example of what not to do, and so you can see what the program says if you pass it a bad topology. • output_validator.py – This script can be run on the log output from the simulation to verify that the output file is formatted correctly. It does not verify that the contents are correct, only the format. • run.sh – A helper script that runs some basic system checks, the topology, and the validator, a wrapper for run_topo.py and output_validator.py . Part 2: TODOs There are a few TODOs in DistanceVector.py: A. Review the methods already implemented in Node.py. a. Because DistanceVector is a subclass of Node, consider how you might use the existing methods to complete the TODOs in this list. b. Do NOT modify Node.py. B. Decide on how each node will represent its distance vector. a. Consider what might be the simplest data structure to keep track of path weights (i.e., the distance vector). b. The distance vector variable should be local to the Node, i.e., defined in the init function as a variable accessible via the `self` object (i.e. self.mylist). C. Implement the Bellman-Ford algorithm. a. Each Node will: i. send out an initial message to its neighbors ii. process messages received from other nodes iii. send updates to other nodes as needed b. Initially, a node only knows of: i. itself and that it is reachable at cost 0, ii. its neighbors and the weights on its links to its neighbors c. NOTE: a node’s links are unidirectional. d. NOTE: The Bellman-Ford algorithm implementation should terminate naturally without external intervention. D. Write a logging function that is specific to your distance vector structure. a. You should use the self.add_entry function to help with logging. b. You should assume that the logging function only knows itself. i. Do NOT access the topology for logging; logging should happen at the Node level. Part 3: Testing and Debugging To run your algorithm on a specific topology, execute the run.sh bash script: ./run.sh *Topo Substitute the correct, desired filename for *Topo. Don’t use the .txt suffix on the command line. This will execute your implementation of the algorithm in DistanceVector.py on the topology defined in *Topo.txt and log the results (per your logging function) to *Topo.log . NOTE: You should not include the full filename of the topology when executing the run.sh script. For example, to run the algorithm on topo1.txt you should only specify topo1 as the argument to run.sh. We’ve included four good topologies for you to use in testing and one bad topology to demonstrate invalid topology. The provided topologies do not cover all the edge cases; your code will be graded against more complex topologies. Part 4: Assumptions and Clarifications A. Node behavior: i. Example: Node B has an incoming link from Node A, but has no outgoing link to Node A, Node B will send its distance vector to node A to “advertise” other nodes it can reach (Nodes C and D).b. A Node’s distance vector is comprised of the nodes it can reach via its outgoing links (including to itself at distance = 0). i. A Node will never advertise a negative distance to itself. (Important for negative cycles.) c. A Node advertises its distance vector to its upstream neighbors. d. Nodes do not implement poison-reverse. B. Edge and Path weights: b. The edge weight value type is an integer. c. There is no upper limit for path weights. d. The lower limit for path weights is “-99”, which is equivalent to “negative infinity” for this project. C. Negative cycles: a. A Node can forward traffic through a negative cycle. b. Negative cycles are a series of directed links that originate and terminate at a single node, where the sum of the link weights is less than 0. i. This can lead to a negative “count-to-infinity” problem. Therefore, your implementation must be able to detect negative cycles to terminate on its own. ii. Any node that can reach a destination node and infinitely traverse a negative cycle enroute will set the distance to that node to -99. 1. Your implementation only needs to detect and record these traversals appropriately; it does not need to mitigate them. iii. A Node can advertise a negative distance for other nodes (but not for itself). iv. A Node that receives an advertisement with a distance of -99 from a downstream neighbor should also assume that it can reach the same destination at infinitely low cost (-99). v. Example: Traffic from Node F to Node D can route through A->B->C->A indefinitely to reach an extremely low (very negative) value.c. A Node will not forward traffic destined to itself. i. Example: The below topology will not result in a count-to-infinity problem, as there are no possible pairs of source and destination nodes where traffic could indefinitely traverse a negative cycle. Node A will not forward traffic for Node A, and similarly for Nodes B and C.D. Topologies used in grading: a. We will be using many topologies to test your project. This includes but is not limited to: o topologies with and without cycles (loops), including odd length cycles o topologies of varying sizes, including topologies with more than 26 nodes o topologies with nodes with names longer than one character o topologies with multiple paths to different nodes o topologies that include any combination of positive weights, zero weight, and negative weight o topologies with Nodes that do not have incoming or outgoing links ▪ All nodes will be connected but: b. We will NOT test your submission against the following topologies (which means your algorithm does not need to account for them): o topologies with more than one link from the same origin to the same destination (multi-graphs) o topologies with portions of the network disconnected from each other (partitioned networks) o topologies that do not require intermediate steps (such as a topology with a single node) o topologies with a valid path between two indirectly linked nodes with no cycle with an actual total cost of ≤ -99 (topologies will respect that -99 is “negative infinity” for this project) Part 5: Correct Logs for Provided Topologies Below are the correct final logs for the provided topologies. We are providing them to help you identify correct behavior with respect to negative cycles and the assumptions in the instructions. We are only providing the final round; each topology should produce at least 2 rounds of output. SimpleTopo: A:(A,0) (B,1) (C,3) (D,3) B:(B,0) (A,1) (C,2) (D,2) C:(C,0) (B,2) (A,3) (D,0) D:(D,0) (C,0) (B,2) (A,3) E:(E,0) (D,-1) (C,-1) (B,1) (A,2)SingleLoopTopo: A:(A,0) (D,5) (E,6) (B,6) (C,16) B:(B,0) (A,2) (D,7) (C,10) (E,0) C:(C,0) D:(D,0) (E,1) (B,1) (A,3) (C,11) E:(E,0) (B,0) (A,2) (D,7) (C,10)SimpleNegativeCycle: AA:(AA,0) (AD,-2) (AE,-1) (AB,0) (CC,-99) AB:(AB,0) (AA,-1) (AD,-3) (CC,-99) (AE,-2) AD:(AD,0) (AE,1) (AB,2) (AA,1) (CC,-99) AE:(AE,0) (AB,1) (AA,0) (AD,-2) (CC,-99) CC:(CC,0) (AB,0) (AA,-1) (AD,-3) (AE,-2)ComplexTopo: ATT:(ATT,0) (CMCT,-99) (TWC,-99) (GSAT,-8) (UGA,-99) (VONA,-11) (VZ,-3) CMCT:(CMCT,0) (TWC,-99) (ATT,1) (VONA,-10) (GSAT,-7) (UGA,-99) (VZ,-2) DRPA:(DRPA,0) (EGLN,1) (GT,-1) (UC,-1) (CMCT,-99) (TWC,-99) (ATT,13) (OSU,-1) (VONA,2) (GSAT,5) (UGA,-99) (PTGN,1) (VZ,10) EGLN:(EGLN,0) (GT,-2) (UC,-2) (DRPA,1) (CMCT,-99) (OSU,-2) (TWC,-99) (ATT,13) (PTGN,0) (VONA,3) (GSAT,5) (UGA,-99) (VZ,11) GSAT:(GSAT,0) (VONA,-3) (VZ,5) (UGA,-99) (ATT,7) (CMCT,-99) (TWC,-99) GT:(GT,0) (UC,0) (EGLN,2) (OSU,0) (DRPA,3) (PTGN,2) (CMCT,-99) (VONA,5) (TWC,-99) (ATT,15) (VZ,13) (GSAT,7) (UGA,-99) OSU:(OSU,0) (UC,0) (GT,0) (EGLN,2) (PTGN,2) (VONA,5) (DRPA,3) (VZ,13) (GSAT,7) (CMCT,-99) (ATT,15) (UGA,-99) (TWC,-99) PTGN:(PTGN,0) (OSU,-1) (UC,-1) (GT,-1) (EGLN,1) (VONA,3) (VZ,11) (GSAT,5) (DRPA,2) (ATT,13) (UGA,-99) (CMCT,-99) (TWC,-99) TWC:(TWC,0) (CMCT,-99) (ATT,1) (VONA,-10) (VZ,-2) (GSAT,-7) (UGA,-99) UC:(UC,0) (GT,0) (EGLN,2) (OSU,0) (PTGN,2) (DRPA,3) (VONA,5) (CMCT,-99) (VZ,13) (GSAT,7) (TWC,-99) (ATT,15) (UGA,-99) UGA:(UGA,0) (ATT,50) (CMCT,-99) (TWC,-99) (GSAT,42) (VONA,39) (VZ,47) VONA:(VONA,0) (VZ,8) (GSAT,2) (ATT,10) (UGA,-99) (CMCT,-99) (TWC,-99) VZ:(VZ,0) (ATT,2) (CMCT,-99) (TWC,-99) (GSAT,-6) (UGA,-99) (VONA,-9)Part 6: Spirit of the Project The goal of this project is to implement a simplified version of a network protocol using a distributed algorithm. This means that your algorithm should be implemented at the network node level. Each network node only knows its internal state, and the information passed to it by its direct neighbors. Declaring global variables will be a violation of the spirit of the project. Part 7: FAQs A: Your solution should not require any outside Python modules. Please do not import any other modules. Q: What is the best way to format and process node messages? A: There is no right or wrong way to format messages. For best results keep things simple. Q: Is it required that the distance vectors displayed in my log files be alphabetized? A: Look at the finish_round function in Toology.py. Note how the DVs are alphabetized each round, and this is reflected in the provided correct output logs. The nodes within individual vectors are not required to be sorted. Q: Should my solution include an implementation of split horizon? A: That is not a requirement for this project. Q: What if there really is a valid path between two indirectly linked nodes with no cycle and the total cost is -99 or less? A. We will not test your submission against a topology that does this. However, from the “Assumptions and Clarifications”, note: “a Node seeing an advertised vector of -99 from a downstream neighbor can assume this means it can reach that same destination at infinitely low cost (-99).” What to Turn In To complete this project, submit ONLY your DistanceVector.py file to Gradescope as a single file. Do not modify the name of DistanceVector. You can make an unlimited number of submissions to Gradescope. Your last submission will be your grade unless you activate a different submission.There are some very important guidelines for this file you must follow: A. Ensure that your submission self-terminates. If your submission runs indefinitely (i.e., contains an infinite loop) or throws an error at runtime, it will not receive full credit. Manually killing your submission via console commands or interrupts is NOT an acceptable means of termination. B. Remove any print statements from your code before turning it in. Print statements left in the simulation, particularly for inefficient but logically sound implementations, have drastic effects on run-time. Ideally, your submission should take less than 10 seconds to process a topology. If your leave print statements in your code and they adversely affect the grading process, your work will not receive full credit. (Feel free to use print statements during the project and during debugging but remove them before you submit.) C. Ensure your logs are formatted properly. Logging is the only way that we can verify that your algorithm is running correctly. The output validator will catch most formatting mistakes, but you should inspect your output manually to make sure it matches the requested format. (See the TODO comment for logging located in DistanceVector.py for format details.) D. Ensure your solution generates completely correct output. Partial credit for individual topologies will not be awarded, even if the distance vector logs are “mostly correct.” E. Check your submission after uploading. As usual, we do not accept resubmissions past the stated deadlines. What you can and cannot share When sharing log files, leave alphabetization on so that your classmates can use the diff tool to see if you are getting the same log outputs as they are. Rubric40 pts Provided Topologies (4 total) For correct Distance Vector results (log file) on the provided topologies. 60 pts Unannounced Topologies (4 total) For correct Distance Vector results (log file) on topologies that you will not see in advance. They are slightly more complex than the provided ones and test some edge cases.GRADING NOTE: There is no partial credit for individual topologies; each topology is either “passed” or “failed”.

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 10

Wenhao Jiang Logistics Part 1: Discrete Dependent Variable 1.1 Binary Response ## load data load(“data/dat.RData”) 1.2 Bernoulli Distribution • When the outcome is binary, Yi ∈ {0,1}, it is common to assume that it follows a Bernoulli distribution. A Bernoulli distribution has one parameter, which we might call π, that represents the probability of a “success.” It is the convention to let Yi = 1 be a success and Yi = 0 be a fail; so, πi = Pr[Yi = 1]. • An example of the Bernoulli distribution is the toss of a biased coin. We can plot the expected probability of whether we get a “tail” or “head” given how the coin is biased.• To model binary outcomes, we are assuming the observed outcome follows a Bernoulli distribution for each unit of observation i with a parameter π that can be predicted with a set of predictors, Xi. 1.3 The Linear Probability Model (LPM) • Suppose that we have a binary outcome Yi ∈ {0,1}, where we think that the success probability depends on a set of predictors Xi = {Xi1,Xi2,…,Xi3}. We might write this as πi = Pr[Yi = 1|Xi] = π(Xi), where the subscript i shows that the success probability will vary across units in our sample, and where we have emphasized that π(·) is a function of a set of predictors. • In the LPM, we assume that π(Xi) is a linear function of the predictors. That is, π(Xi) = E[Yi|Xi] = β0 + β1Xi1 + ··· + βkXik, i = 1,2,…,n. • Notice that the equation looks exactly the same as the linear regression model we have considered in the previous labs. The only difference is that the outcome is a binary variable. • Linear Probability Model can be estimated using the same method as we estimate a regular linear model. In R, simply use lm(). In our case, we model the outcome variable trump using all the predictors in the dataframe. ## print summary summary(lpm) ## ## Call: ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.0409 -0.1428 0.0118 0.1296 1.0514 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.290119 0.097759 -2.968 0.003059 ** ## pid 0.170488 0.003975 42.889 < 2e-16 *** ## log_inc 0.028229 0.009199 3.069 0.002198 ** ## female -0.042548 0.017531 -2.427 0.015369 * ## — ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1 ## ## Residual standard error: 0.3036 on 1217 degrees of freedom ## Multiple R-squared: 0.6245, Adjusted R-squared: 0.6232 ## F-statistic: 506 on 4 and 1217 DF, p-value: < 2.2e-16 • The interpretation of the coefficients of LPM is straightforward: Holding other variables constant, one unit increase in Xk will increase/decrease the probability of Y = 1 by βk. • We can plot the LPM’s predicted probability of voting for Trump by Party ID. ## create new dataset for predictions pred_dat |z|) ## (Intercept) -6.47611 1.11850 -5.790 7.04e-09 *** ## pid 1.20627 0.06211 19.420 < 2e-16 *** ## log_inc 0.29218 0.10210 2.862 0.004215 ** ## female -0.46170 0.19580 -2.358 0.018372 * ## — ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1666.84 on 1221 degrees of freedom ## Residual deviance: 709.81 on 1217 degrees of freedom ## AIC: 719.81 ## ## Number of Fisher Scoring iterations: 6 2.5 Model Interpretation • In the summary for logistic regression models: – 1. The coefficients in the Estimate column show the estimated regression coefficients, i.e., the βˆk’s. For example, the coefficient of the pid variable suggests that a unit increase in pid is associated with a 1.206 increase in the logit (the log odds) of the probability of voting for Trump. – 2. We might also interpret the coefficients in terms of odds by exponentiating them. Hence, the odds of the outcome are predicted to increase by a factor of eβˆ1 for each unit increase in pid. For example, a unit increase in pid is associated with an increase of the odds of voting for Trump by a factor of e1.206 = 3.340. – 3. You can calculating the exponentiated regression coefficients of a logistic regression model by using the coef function to extract the estimated coefficients from the model and, then, use the exp() function: ## extract coefficients l_coef = coef(l_reg) ## print coefficients and their exponentiated form rbind(coef = l_coef, exp_coef = exp(l_coef)) ## coef -6.476108455 1.206265 0.2921768 -0.4616960 -0.7601581 ## exp_coef 0.001539791 3.340984 1.3393398 0.6302139 0.4675925 2.6 Predicted Probabilities • It is always a good idea to plot the predicted probabilities (both for yourself and for your readers). In other words, we want to plot how the probability of the outcome changes when we vary a focal variable while fixing the remain variables at certain values. • In R, doing this is quite straightforward. We have already created a new dataset for which we want the predictions above (when plotting the predicted probabilities using the LPM). Let us use the exact same dataset again. ## predict probability of voting for Trump (using logit model) yhat_logit = cbind(pid = 0:6, predict(l_reg, # model object is different! newdata = pred_dat, # data for prediction is the same! type = “response”)) %>% as.data.frame() • By using the type = “response” option, we will obtain the predicted probabilities. However, the SE of the predicted probabilities is not available. • SE is available, however, when we predict the logit by setting se.fit = TRUE. We specify the option type = “link” in the predict function. It can be shown that the sampling distribution of the predicted logits follow a Normal distribution in large samples – As the predicted logits are Normally distributed in large samples, we can use these estimated standard errors to calculate the 95% confidence intervals of the predicted logits. These intervals will have the form CI = logit(πˆi) ± 1.96 × S.E.d(logit(πˆi)). – This will give us the confidence interval for the predicted logits. But we want the 95% CIs for the predicted probabilities. Here we use the fact that the inverse-logit function is a strictly increasing function and just apply the function to both end-points of the confidence interval. This will give us the confidence interval for the predicted probabilities. That is, if the 95% CI for the predicted logits has the form (a,b), then interval (logit−1(a),logit−1(b)) will be the 95% confidence interval for the predicted probabilities. • In R, we can do this as follows: # predict the logit and standard errors pred_logit % as.data.frame() %>% select(fit, se.fit) # calculate 95% CI for logits pred_logit % mutate(lwr = fit – 1.96 * se.fit, upr = fit + 1.96 * se.fit) # apply inverse-logit function to get pred. probs and CI pred_p % mutate_at(1:4, function(a){1 / (1 + exp(-a))}) %>% mutate(pid = pred_dat$pid) # plot predicted probabilities (save plot in object l_plot) l_plot % ggplot(aes(x = pid, y = fit)) + geom_line(col = “black”) + geom_ribbon(aes(ymin = lwr, ymax = upr), fill = “grey”, alpha = .5, col = NA) + scale_y_continuous(name = “Predicted Probability “, breaks = seq(0, 1, .25)) + scale_x_continuous(name = ” Party Identification”, breaks = seq(0, 6, 1)) + geom_hline(yintercept = c(0, 1), linetype = 2) + geom_vline(xintercept = seq(0, 6, 1), linetype = 3, col = “grey”) + theme_classic() + ggtitle(“Probability of Voting for Trump by Party ID”, subtitle = “Results from Logistic Regression Model”) # print plot print(l_plot) Probability of Voting for Trump by Party IDParty Identification • Notice that all the predictions and the corresponding confidence intervals lie between zero and one, as desired. Furthermore, we see that our model predicts that the probability voting for Trump is almost zero for Strong Democrats (pid = 0~1) and almost one for Strong Republicans (pid = 5~6). • This is a much more intuitive presentation of your results (or the meaning of the estimated regression coefficients) than an exponentiated coefficient of 3.341. So, whenever you run these models you should try to plot the predicted probabilities. Lastly, we can use the ‘ggpubr::ggarrange“ function to compare the LPM and the logistic regression model: ggarrange(lpm_plot, l_plot, nrow = 1)Party Identification Party Identification

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 1

Wenhao Jiang Logistics & Announcements • Alternatively, you can send me emails, and I will typically respond to your question on the same day. • Plan of assignments (tentative): • Please email me at least 3 days in advanced if you need additional time. • Assignments will assist you – understand key concepts in statistics using simulations – familiarize yourself with codings in R – prepare for the final replication project • Working in groups is strongly encouraged. • The final replication project is based on this ASR paper: Going Back in Time? Gender Differences in Trends and Sources of the Racial Pay Gap, 1970 to 2010, by Hadas Mandel and Moshe Semyonov Prerequisite • Download & install R: https://cloud.r-project.org/ • Download & install RStudio: https://rstudio.com/products/rstudio/download/ Part 1: Basics of RStudio and R Markdown What are R & RStudio • R is a free programming language commonly used for statistical programming and graphics. Download & install: https://cloud.r-project.org/ RStudio is an IDE (Integrated Development Environment) for R. It’s an application that enables you to write, run, and save your R code and programming outputs. Download & install: https: //rstudio.com/products/rstudio/download/ The layout of RStudioCoding in R Script vs. R Markdown • R Script is a simple code script document. The output of R script cannot be saved within the script. • R Markdown is a simple formatting syntax for authoring HTML, PDF, and even Microsoft Word documents. • Different from R Script, R Markdown allows users to present both their code and the code’s output (tables, plots, etc.) in a single document, usually by “knitting” (rendering) an R Markdown to a HTML or PDF file. • R Markdown allows you to divide your code into sections, which helps you better organize your code. • R Markdown also allows you to type mathematical equations efficiently. Equation formats are identical to LaTeX. If you are coding for assignments, R Markdown is required. If you are coding for simpler tasks, you can use R script.Figure 2: R Script (Left) and R Markdown (Right) R Markdown: Layout, the Markdown languages, and Knitting • In a R Markdown file, chunks with a white background are text editor chunks. You can incorporate formatted text (including mathematical equations) using the Markdown language. Use this cheatsheet for Markdown guidance. • Chunks with a grey background are coding chunks. You will code in these chunks. • You can run your code by line, or by chunk. The output (if any) will be displayed after the current coding chunk. • You can knit (export/convert) a R Markdown file to HTML, PDF, or Word using the Knit button. Additional tips: Typing equations in R Markdown • To type mathematical symbols and expressions, you need to follow a particular Markdown syntax. For example, to print the Greek letter α, you need to type $alpha$ in the text editor chunk of your R Markdown file. • You don’t need to memorize all of the expressions. Google or refer to this guide when you work on mathematical equations. • To insert equations, you need to wrap your expression with the dollar sign. To type “inline equations” (equations that won’t break your lines), use the dollar sign $ to wrap your expression: $hat{mu} = Pn yi rac{sum_{i = 1}ˆ{n}y_i}{n}$ gives: µˆ = ¯y = in=1 . For “displayed equations” (equations that will break your lines), use the double dollar sign $$ to wrap your expression: $$hat{mu} = ar{y} = rac{y_1 + y_2 + y_3 + … + y_n}{n} = rac{sum_{i = 1}ˆ{n} y_i}{n}$$ gives: y1 + y2 + y3 + … + yn Pni=1 yi µˆ = ¯y = = n n • In R Markdown, hovering over your equation expressions will give you a preview of the equation you write. Additional tips: Knitting R Markdown to HTML or PDF As mentioned earlier, you can knit R Markdown files to HTML, PDF, or Word.• Before you knit, always make sure you can run your code from beginning to end. You won’t be able to knit if some codes throw error messages. (We will talk about debugging later.) • There are many options in R Markdown that helps you manipulate how you want to present your document. For example, you can hide the code chunk and only show the output by adding echo = FALSE in your code chunk options. You can also use include = FALSE to prevent the code AND its output to appear in your knitted document. You can also use eval = FALSE to prevent your code from running (but it will be displayed) in your knitted document. For detailed documentation of knitting options, read here. Part 2: Coding in R In this part, we will go through the basics of the R language, including installing packages, types of variable objects, types of data objects, and how to code a function in R. 1. Installing and using packages • Assume you are starting a new coding task, you should first open RStudio, create a coding file (either R script or R Markdown), and save the file to a path in your computer. • After creating a .Rmd or .R file, you need to install and load necessary packages (using install.packages() and library()) so that you can use functions from other statistical packages. • You only need to install packages once. After they are installed, you can simply load them to your environment in the future using the library() function. • For example, to use the packages tidyverse, gridExtra, and kableExtra, you need to type the following code in your coding chunk: install.packages(c(“tidyverse”, “gridExtra”, “kableExtra”)), then use the library() function to load each of them in your environment. # Install packages # After you install, you can delete the line below and keep only the ‘library’ line # install.packages(c(“tidyverse”, “gridExtra”, “kableExtra”)) # Load package to environment library(tidyverse) library(gridExtra) library(kableExtra) 2. Types of variables in R We use R to perform data cleaning and statistical analysis. But before that, we need to have a basic understanding about how to create, save, and retrieve unit of information in R. First, we will talk about types of variables. This is similar but not entirely the same as the types of variables we discussed in class (categorical vs numeric). In R, types of variables is relevant because R processes different variable types differently. • Most common data types in R: (i) Logical variable: TRUE (T) or FALSE (F) (iii) Numeric variable: – R automatically converts between these two classes when needed for mathematical purposes. • Variable types matter when you use different functions in R. For example, you cannot perform arithmetic (e.g. mean or median) with character variables even if they appear to be numbers. • Check variable type using class() on the target variable or str() (structure) on the target dataset • To create a variable, you give it a name first, then use either % mutate(gdpPercap_in_thousand = gdpPercap/1000, gdp = pop * gdpPercap, log_gdp = log(gdp), ## natural log log2_gdp = log2(gdp), id = row_number()) ## create id by row number ## ———- pipeline ———- ## gapminder %>% filter(year == 2007) %>% arrange(desc(gdpPercap)) %>% slice(1:5) 5. Make untidy data tidy • In many cases, data are untidy. • One typical example is that the data is in a long format, with a single column containing multiple variable values. • What if it’s not tidy? – There are two pivot functions in tidyverse that help you make untidy data tidy. – pivot_longer() helps you to bring the information in the column names to being values in a single column. – pivot_wider() does the opposite • Remember what count as tidy depends on your questions, specifically, what count as an observation in your study. ## observe the data structure of the two untidy examples View(tidy_df1) View(tidy_df2) ## for `tidy_df2`, we need “cases” and “population” to have their own columns tidy_df2 %>% pivot_wider(names_from = type, values_from = count) ## we can save clean df as a new object tidy_clean % pivot_wider(names_from = type, values_from = count) ## and export as .csv to your data folder write.csv(tidy_clean, “tidy_clean.csv”, row.names = F) ## for df1, we need to first bring years from column names to a variable ## then put values of “cases” and “population” in two columns tidy_df1 %>% ## bring years from column names to a variable pivot_longer(cols = c(year_1999, year_2000), names_to = “year”, values_to = “count”) %>% ## remove “year_” prefix in the year variable mutate(year = str_remove(year, “year_”)) %>% ## put values of “cases” and “population” in two columns pivot_wider(names_from = type, values_from = count) 6. Summarise and group data • The summarise() function collapses many values down to a single summary, e.g. mean, median, standard deviation, max, min, etc. • The group_by() function creates a grouped copy of a table, thus you can apply various functions to each group. • Combining group_by() with summarise(), you can get various descriptive statistics for your data, either for the entire dataset, or by group (e.g. groups by gender, race, education level, etc.). ## example for summarise() gapminder %>% filter(year == 2007) %>% summarise(avg_life = mean(lifeExp)) ## example for combining group_by() and summarise() gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarise(avg_life = mean(lifeExp)) ## you can get many different summary statistics for each group using summarise() summary1 % filter(year == 2007) %>% group_by(continent) %>% summarise(year = 2007, n_country = n(), max_gdpPercap = max(gdpPercap), min_gdpPercap = min(gdpPercap), mean_gdpPercap = mean(gdpPercap), sd_gdpPercap = sd(gdpPercap)) summary1 • Whenever there are NA values (meaning some values are not available in a column), it is necessary to add , na.rm = T in function. For example, gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarise(avg_life = mean(lifeExp, na.rm=T)) Part 5 Preparations for Assignment 1 • We can simulate some of the concepts using R, as R can “randomly” sample from a hypothetical “population”. ## set randomization “seed” ## this is to ensure you can replicate results, ## create a hypothetical population where samples can be drawn pop

$25.00 View

[SOLVED] Ecse324 – lab 1: introduction to arm programming

ECSE 324 – Computer Organization Introduction 1 Working with the DE1-SoC Computer System For this course, we will be working with the DE1-SoC Computer System, which is composed of an ARM Cortex-A9 processor and peripheral components located on the FPGA on your DE1-SoC board. The IDE we will be using is the Intel FPGA Monitor Program 16.1. In this part of the lab, you will learn how to program the Computer System in ARM assembly. 1.1 Learn about the tools 1.2 Your first assembly program 1. Open the ‘Intel FPGA Monitor Program 16.1’ from the desktop icon and select File->New.Figure 1: Your first assembly program – Step 1 2. In the new editor window, type out the code as shown in Figure 2 and save this file as ‘part1.s’within a new folder ‘GXX Lab1’ on your network drive. Here, GXX stands for your group number! eg. Group 1 would be G01 Lab1. The code is a simple program to find the maximum number from a list of ‘NUMBERS’ with length ‘N’. Notice the extensive use of comments! This practice should be used throughout this course, especially with assembly programming! NOTE: The indentation is important. The code will not compile if not indented as shown.Figure 2: Your first assembly program – Step 2 3. Open the ‘Intel FPGA Monitor Program 16.1’ from the desktop icon and select File->New Project.Figure 3: Your first assembly program – Step 3 4. Set the project directory to GXX Lab1 and set the project name to GXX Lab1. Select the ‘ARM Cortex-A9’ processor architecture, and click ‘Next’.Figure 4: Your first assembly program – Step 4 5. In the next window, under ‘Select a system’ select the ‘DE1-SoC Computer’ and click ‘Next’.Figure 5: Your first assembly program – Step 5 6. In the next window, under ‘Program type’ select the ‘Assembly Program‘ and click ‘Next’.Figure 6: Your first assembly program – Step 6 7. In the next window ‘Specify program details’, click on ‘Add…’ and select the file ‘part1.s’ created in step 1, and click ‘Next’.Figure 7: Your first assembly program – Step 7 8. In the next window ‘Specify system parameters’, ensure that the board is detected in the ‘Hostconnection’ box, and click ‘Next’. Note that the board has to be plugged in via USB and powered on to be detected.Figure 8: Your first assembly program – Step 8 9. In the next window ‘Specify program memory settings’, simply click ‘Finish’.Figure 9: Your first assembly program – Step 9 10. A dialogue box should now pop up, asking whether you would like to download the systemonto the board. If you were successfully able to flash your JIC file in Lab0, click ‘No’, otherwise click ‘Yes’.Figure 10: Your first assembly program – Step 10 1.3 Using the IDE Now that we have created our first assembly project, let’s take a look at some of the features of the IDE and use them in order to debug this program and verify that it works as desired NOTE: This section only provides a very brief introduction to the IDE. More detailed information can and should be obtained in the documentation and by experience! 1. Figure 11 shows the useful features of the IDE when a project is opened. We can say that weare now in ‘development mode’ – where the code is not loaded onto the board and we are in the process of writing code and compiling it to check for errors. The green box highlights the different IDE window tabs, and since we are in development mode, the only useful window is the ‘Editor’ window where code can be created/modified. You can add/remove windows using the ‘Windows’ menu at the top. The red box highlights three useful buttons in development mode – ‘Compile’, ‘Load’, and ‘Compile & Load’. Their functions are self-explanatory. Actually, ‘compiling’ refers to converting higher level computer code (such as C code) into assembly instructions. What we are doing here is ‘assembling’, which refers to the conversion of assembly instructions into machine code. However, since Altera has decided to call it the ‘Compile’ button, we will stick with that name for the sake of clarity.Figure 11: Using the IDE – Development mode 2. When the code is loaded onto the board (by clicking either ‘Load’ or ‘Compile & Load’), wecan say that we are now in ‘debug mode’. The IDE is now connected to the board via a debug server, and we can send execution instructions to the board and receive data (such as register and memory values) back from the board. The green box highlights the two important windows in this mode. In the Disassembly window, we can see the code that is being executed, as well as the current instruction when the code is paused. We also have the ability to set/remove breakpoints by clicking on the grey area to the left of the instruction. The Disassembly window is the most important window in debug mode. In the Memory window, we can see the contents of a desired memory location, but only when the program is paused! The red box highlights the useful buttons in debug mode. Using them, we can ‘Continue’, ‘Pause’ and ‘Restart’ the program execution. We can also step by a single instruction, or step over multiple instructions. Finally, we can also disconnect from the board.Figure 12: Using the IDE – Debug mode 3. Now let’s run the code and verify the result. Before you do this, make sure you have read the code and understand how it works, otherwise you won’t know what it is that you’re checking! Ensure that we are in debug mode and looking at the Disassembly window. Click on the ‘Continue’ button, and then click on the ‘Pause’ button. The code should stop at the B END instruction. Notice how the contents of the registers have now changed, and R0 contains the expected value! Experiment with the IDE features by restarting the program from the first instruction and arriving at the end via steps and breakpoints. Finally, note the address 0x00000038 of RESULT, as it will be used in the next part.Figure 13: Using the IDE – The Disassembly window 4. Now move over to the Memory window, and search for the value in the address of RESULT.Once again, we can see that the expected value has appeared in that memory location.Figure 14: Using the IDE – The Memory window 2 Some programming challenges Now that you have gone through a simple example in which we have given you the program to be executed, you should complete the following tasks, which will require you to write your own programs. NOTE: You will have to add the new files you will create to your current project GXX Lab1. Since the same label ‘ start’ cannot be used in multiple files, and subroutines are beyond the scope of this lab, the workaround you should use in this lab is to only have one file added to the project at any given time! 2.1 Fast standard deviation computation Suppose that you would like to use the ARM processor to compute the standard deviation of a signal X = {x1,x2,…,xN}. The formula for the standard deviation is: (1) where µˆ is the average value of the signal. Unfortunately, implementing this formula requires multiplication, division, and square root operations, which are not available as instructions on all processors and are slow to emulate using other instructions. The standard deviation can be approximately computed in a more hardware-friendly way using the so-called “range rule”: (2) where xmax and xmin are the maximum value and minimum value of the signal, respectively. Write an ARM assembly program which computes the standard deviation of a signal, using the range rule. The program should accept input values – more specifically, the number of samples in the signal and their values – using a similar approach as shown in Part 1. Save your code in a file named ‘stddev.s’ (Hint: you can reuse your code from Part 1 to compute the maximum value. Then, you can make a simple modification to this code to get code which computes the minimum. Also, remember that dividing by a power of 2 can be implemented using shift instructions.) 2.2 Centering an array It is often necessary to ensure that a signal is “centered” (that is, its average is 0). For example, DC signals can damage a loudspeaker, so it is important to center an audio signal to remove DC components before sending the signal to the speaker. You can center a signal by calculating the average value of the signal and subtracting the average from every sample of the signal. Write an ARM assembly program to center a signal. In this example, store the resulting centered signal ‘in place’ – i.e. in the same memory location that the input signal is passed in. The program should be able to accept the signal length as an input parameter. In order to simplify calculations, work with the assumption that only signal lengths that are powers of two can be passed to the program. Save your code in a file named ‘center.s’ 2.3 Sorting Write an ARM assembly program which sorts an array in ascending order. You could use the simple bubble sort algorithm: // Given an array A of length N sorted = f a l s e while not sorted : sorted = true for i = 2 to N: i f A[ i ] < A[ i −1], swap A[ i ] with A[ i −1] and s e t sorted = f a l s e You could also implement a more sophisticated sorting algorithm. Store the resulting sorted array ‘in place’. The program should be able to accept the array length as an input parameter. Save your code in a file named ‘sort.s’ 3 Grading and report • Largest integer program (10%) • Standard deviation program (15%) • Centering program (25%) • Sorting program (30%) Finally, the remaining 20% of the grade for this Lab will go towards a report. Write up a short (2-3) page report that given a brief description of each part completed, the approach taken, and the challenges faced, if any. Please don’t include the entire code in the body of the report. Save the space for elaborating on possible improvements you made or could have made to the program, such as a feature to detect empty (length 0) arrays, etc. Your final submission should be a single compressed folder that contains your report and the four assembly files – ‘part1.s’, ‘stddev.s’, ‘center.s’, and ‘sort.s’.

$25.00 View

[SOLVED] Csed311 lab1: rtl design

Contents • Combinational logic • Register-Transfer Level • Finite State Machine • Assignments • Combinational logic (ALU) • FSM (vending machine) 2 • Represents Boolean function (time(clk)-independent) • Output is a function of inputs only • output = f(input) • Example: 4-to-1 mux• Implementation with Verilog (mux) • If you use “always”, simply giving “(*)” as the sensitivity list is equivalent to the code below Using Assign Using Always4 • RTL schematicRegister-Transfer Level • The design method to implement a synchronous circuit with an HDL (Hardware Description Language) • Synchronous circuit consists of: • Register: a memory element synchronized by the clock signal • Combinational logic: logical functionRTL – example • Toggler example• Moore Machine • Mealy Machine • Moore Machine • Outputs only depend on the current state. • Mealy Machine • Outputs depend on the current state and the current inputs • Mealy Machine • Mealy Machine can be made synchronousAssignment 1.a • Implement ALU (Arithmetic Logic Unit) in Verilog • A skeleton code and testbench will be provided • Lab1/ALU/* • Input: (A, B, FuncCode) • A: left operand (16-bit signed binary) • B: right operand (16-bit signed binary) • FuncCode: operator (4-bit binary) • Output: (C, OverflowFlag) • C: operation result (16-bit signed binary) • OverflowFlag: overflow flag (1-bit binary) Assignment 1.a (cont’d) • ALU operations FuncCode Operation Comment FuncCode Operation Comment 0000 A + B Signed Addition 1000 A B Bitwise XOR 0001 A – B Signed Subtraction 1001 ~(A B) Bitwise XNOR 0010 A Identity 1010 A > 1 Logical Right Shift 0100 A & B Bitwise AND 1100 A > 1 Arithmetic Right Shift 0110 ~(A & B) Bitwise NAND 1110 ~A + 1 Two’s Complement 0111 ~(A | B) Bitwise NOR 1111 0 Zero Assignment 1.a (cont’d) • Overflow detection • For addition and subtraction, you should detect overflow and set the flag. Overflow Flag Singed Addition Signed Subtraction 0 Correct Result Correct Result 1 Wrong Result Wrong Result • For other operations, OverflowFlag is always zero. Assignment 1.b • Implement a simple vending machine RTL in Verilog • A simple Finite State Machine (FSM) • Your vending machine should cover all use-cases (in the next slides) • A skeleton code and testbench will be provided • Lab1/vending_machine/*• Vending machine interface INPUT Signal Description Number of bit(s) i_input_coin Insert Coin 1 for each type of coins i_select_item Select Item 1 for each type of items i_return_trigger Return change 1 clk clock 1 reset_n reset 1 OUTPUT Signal Description Number of bit(s) o_output_item Indicate dispensed items 1 for each type of items o_available_item Indicate item availability 1 for each type of item o_return_coin Indicate type of coin (change) 1 for each type of coins • Vending machine use-case Assumption: infinite item, change Sequence 1. Insert money (available money unit: 100, 500, 1000 won) and initialize waiting time(=100) 2. Vending machine shows all available items where (item cost

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 6

Wenhao Jiang Part 1: Gauss-Markov Assumption and Strict Exogeneity Yi = β0 + β1Xi + i • Zero conditional mean: in this population, E(i|X) = 0. We also call this the strict exogeneity assumption. This means that, no matter which value X takes, the expectation of i associated with this X value will be 0. If the assumption is met, the following statements will be true: – . This is because of the Law of Iterated Expectations (detailed explanations here): (i|X)) = E(0) = 0 – i is independent of X. In other words, i is not a function of X, otherwise E(i|X) = E(f(X)|X) = f(X) 6= 0 • The independence between i and X, if violated, would produced a biased estimation. That is, if we sample from this population and derive βˆ1, E(βˆ1) 6= β1. – This will be part of your assignment 2. – You don’t even have to sample from the population. You can see this biasedness by creating a population where i and X are not independent, and when you regress Yi on X, your derived βˆ1 will be very different from β1. – In reality, this is called omitted variable bias. Part 2: Important Properties of OLS Estimation • OLS minimizes the sum of the square of the error term n argminf(βˆ0,βˆ1) = XYi − βˆ0 − βˆ1Xi2 βˆ0, βˆ1 i=1 • We use partial derivative for the solutionn n which gives: XYi − βˆ0 − βˆ1Xi = Xei = 0 i=1 i=1n n which gives: XXi Yi − βˆ0 − βˆ1Xi = XXiei = 0 i=1 i=1 – The two facts, Pni=1 ei = 0 and Pni=1 Xiei = 0, are forced to be true in OLS estimation • ei does not have life on its own. It has its meaning and value through βˆ0 and βˆ1 • Pni=1 Xiei = 0 forces the covariance between ei and Xi to be 0. But this does not imply independence. Part 3: Multivariate Regression & Interaction with One Dummy Dummies • For categorical variables, we create dummies or convert them to 0 or 1 dummies when we want to include them in a regression model • Note that for a categorical variable that have n categories, the regression model will only have n − 1 dummies or categorical variable predictors, because the nth dummy is redundant given that if an observation does not belong to any of the n − 1 category, then it must be in the nth category • We call the left-out category the reference category • Question: what if we include all n categories? • You should always interpret your model coefficients with the reference category in mind. This could get complicated when you have multiple dummy variables, especially when they are interacted in your model In the case of the dummies representing “race” in the earnings_df that we will be using today, we have: Category Dummy1(black) Dummy2(other) White 0 0 Black 1 0 Other 0 1 Exercise (from Lab 5) 1. Import earnings_df.csv to your environment. Perform the following data cleaning steps: (1) If age takes the value 9999, recode it as NA; (2) Create a new variable female that equals 1 when sex takes the value female, and equals to 0 otherwise; (3) Create a new variable black that equals 1 when race is black and equals to 0 otherwise; (4) Create a new variable other that equals to 1 when race is ’other‘ and 0 otherwise. 2. Use the describe() function from the psych package to generate a quick descriptive statistics of your data. 3. Now, estimate the following models and display your model results in a single table using stargazer(m_1, m_2, …, m_n, type=”text”). (1) Model 1: earn ~ age (baseline) (2) Model 2: earn ~ age + edu (3) Model 3: earn ~ age + edu + female (4) Model 4: earn ~ age + edu + female + race (5) Model 5: earn ~ age + edu + female + race + edu*female 4. Write down your prediction equation for Model 5 5. In Model 5, holding other variables constant, what will be the predicted difference in estimated mean earnings for a white man and a white women? 6. Holding other variables constant, what will be the predicted difference in estimated mean earnings for a white women and a black women? 7. Holding other variables constant, what will be the predicted difference in estimated mean earnings for a white man and a black women? ## read data earnings_df 9000 ~ NA, .default = age )) ## recode female earnings_df % mutate(female = case_when( sex == “female” ~ 1, .default = 0)) ## base R way of doing it earnings_df$female

$25.00 View

[SOLVED] Csc3310 – lab 1: benchmarking insertion and selection sort

Learning Outcomes • Learn how to write benchmarks in Python • Implement iterative sorting algorithms such as insertion sort and selection sort • Apply asymptotic time complexity analysis to choose among competing algorithms Overview You are going to implement the insertion and selecting sort algorithms. You will then benchmark the run time of the two algorithms under the best, worst, and average case scenarios. You will plot the run times and interpret the plots in relation to the asymptotic run time.Instructions 1. Implement Insertion and Selection Sort 1. Create a Jupyter notebook named lastname_lab01. The notebook should have a title, your name, and an introduction. 2. Implement the following functions in the notebook: • void insertion_sort(lst) – The function takes a Python list and sorts it in place. • void selection_sort(lst) – The function takes a Python list and sorts it in place. 3. Write tests (in the notebook) to ensure that the two algorithms are implemented correctly2. Write a Benchmark Function Benchmarking can be used to (1) validate that the run time of an algorithm implementation agrees with the formal analysis and (2) compare the run times of two or more algorithms. You will need to design a benchmark to measure the change in the run time of a sorting algorithm with the changes in the size of a list of numbers being sorted. long benchmark(sorting_algorithm, input_list) – As input, the function takes a reference to a sorting function and the list to sort. The function returns the elapsed time in seconds. You can use the following template:import time # DO ANY SETUP start_time = time.perf_counter() # PUT CODE YOU ARE BENCHMARKING HERE end_time = time.perf_counter() elapsed = end_time – start_timeWhen designing your benchmark, keep the following in mind: 1. Do not modify the input list object so it can be reused across benchmarks. For some sorting algorithms, the run time varies based on the order of elements (e.g., does better when a list is already sorted). After the first iteration, the list will be sorted and could throw off the benchmark results. You should make a separate copy of the original input list for each trial. 2. Do not perform any data structures operations (e.g., list appends) inside the benchmark loop. For example, if you accidentally perform a O(n) operation when trying to benchmark an O(log n) algorithm, the O(n) operation will dominate and throw off your benchmark. If you need to do any setup, do it before the benchmark loop. 3. Design and Execute the Benchmarks 1. The dominating term in the run time complexities only dominates for large enough input sizes. 3. List sizes should vary by orders of magnitude (e.g., 100, 1000, etc.). 4. You should benchmark at least 5 list sizes to be able to reliable differentiate between linear and non-linear behavior.4. Validating Formal Run Times We can estimate the run time complexity function from the measured run times using a little bit of statistics. 1. Fit a linear regression model to the logarithms (base doesn’t matter) of the list sizes (s) and run times (r) to estimate the slope (m): log r = m log s + b 2. The slope tells us the exponent of the growth function: m Run Time 0 Constant < 1 Sub-linear (e.g., log n) 1 Linear 1 < m < 2 Between linear and quadratic (e.g., n log n) 2 Quadratic (e.g., n2) 2 < m < 3 Between linear and quadratic (e.g., n2 log n) 3 Cubic (e.g., n3)You can calculate the slope (m) using the following code snippet: import numpy as np from scipy.stats import linregress m, b, _, _, _, _ = linregress(np.log(list_sizes), np.log(run_times))5. Comparative Analysis of Algorithm Run Times We want to make the following comparisons: 1. Compare run times of the three cases within each algorithm 2. Compare run times of each case across all of the algorithmsYou can do this by making a series of plots of the benchmark data in different combinations. For example, to compare the run times of multiple cases for a single algorithm, you would create a plot with 3 lines (one for each case). The horizontal axis would have the list sizes, while the vertical axis would have the run times. In total, you will create 5 plots (1 for each algorithm with the 3 cases, 1 for each case with the 2 algorithms).import matplotlib.pyplot as plt plt.plot(list_sizes, run_times_best, label=”best “) plt.plot(list_sizes, run_times_average, label=”average”) plt.plot(list_sizes, run_times_worst, label=”worst”) plt.xlabel(“List Size”, fontsize=18) plt.ylabel(“Run Time (s)”, fontsize=18) plt.title(“Insertion Sort”, fontsize=20) plt.legend()6. Reflection Questions 2. Which algorithm had a better run time than the other and for which case? Why do you think that one case was substantially faster for that algorithm? (Hint: focus on the inner loops.) 3. Based on your results, which of the two sorting algorithms would you use in practice?Submission Instructions Save the notebook as a PDF and upload it to Canvas.Rubric I will be looking for the following: • A name, title, and introduction (including your own summary of the lab) at the top of the notebook. Make sure to put your name at the top of the notebook. • Code is to a high technical quality with little or no duplicated code • That your benchmarks and analyses are correct • That your line plots look reasonable Followed submission instructions 5% Overall quality of the presentation of the notebook including plot labels 10% Algorithm implementations 10% Algorithm tests 5% Benchmark function is correct 15% 6 benchmarks executed correctly 15% Correct estimation of growth function orders 10% 5 comparisons plots are correct 10% Plots of run times are correct 10% Answers to reflection questions 10%

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 5

Wenhao Jiang Logistics & Announcement First, load packages to your environment. Today we will use several new packages. Please install them before you run the following chunks. knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(foreach) library(stargazer) library(ggcorrplot) library(psych) Part 1: Data Simulation and How to Simulate Regressions Data simulation is the opposite of data analysis. You create a data set through simulation and then “understand” it using the analytically tools you learned. Advantages for doing data simulation 1. The truth is known: We know the values of simulated parameters, and we can therefore compare them with the model estimates. This gives us a better understanding of how well the model performs. 2. Understanding parameters through tuning: Sometimes the effect of one or more parameters is unclear. Being able to tune the parameters and observe how they affect the resultant data helps us better understand the parameters. 3. Evaluating the bias and efficiency of statistics: As we have practiced in PS1, by simulating a virtual population, we can directly observe how the sampling procedure affects the variability of the sample statistics. In general, through simulation, we can generate sampling distribution of any statistics we are interested in, and evaluate their bias and efficiency. 4. Understanding models: Finally, data simulation provides proof that you understand a model: If you can simulate data under a certain model, then it is likely that you really understand that model. Simulation from a stochastic process For a social system, we always take into account the randomness in social life when we simulate data. In other words, the data is generated from a stochastic process in which the state of the system cannot be precisely predicted given its current state and even with a full knowledge of all the factors affecting that process. In concrete, it means that we always include an error/residual term when simulating a social process. This error/residual term conveys the unpredictability and randomness of social lives. Simulate a bivariate relationship For example, let’s simulate a population data in which the years of education affect one’s income rank according to the following equation: Ii = 10 + 6 · Ei + i We will first simulate years of education – the independent variable (IV), then income rank – the dependent variable (DV) according to the above equation. We simulate years of education using rpois(), a function that generates a random Poisson distribution with a specified parameter λ. You can learn more about the distribution here. ## simulate IV (edu level) set.seed(1234) edu % as_tibble() %>% ggplot(aes(value)) + geom_histogram(color = “black”, fill = “grey”, binwidth = 1) + labs(x = “Years of Education”) + theme_bw()## summary statistics summary(edu) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.000 4.000 6.000 6.022 8.000 18.000 When simulating earning, since we are simulating a stochastic process, we always add an error term, noted as i in the above equation. Note that normally this error term is modeled using rnorm() assuming the error term is normally distributed with a mean of 0 and a sd that is a constant value, so that the errors vary following the same random pattern across all values of the IV. But we might need to change this when we want to simulate data that violate the homoskedasticity assumption, which means the error term is not purely random but dependent on the value of the IV. ## simulate DV set.seed(1234) earn % as_tibble() %>% ggplot(aes(value)) + geom_histogram(color = “black”, fill = “grey”, binwidth = 5) + labs(x = “Income Rank”) + theme_bw()0 50 100 Income Rank ## summary statistics summary(earn) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -13.27 33.81 45.42 46.14 57.73 122.96 ## combine data frame df % ggplot(aes(x = x_edu, y = y_earn)) + geom_point(shape = 1, alpha = 0.7) + geom_smooth(method = “lm”) + labs(title = “Relationship Between Years of Education and Income Rank”, subtitle = “(using simulated data)”, x = “Years of Education”, y = “Income Rank”) + theme_bw() ## `geom_smooth()` using formula = ‘y ~ x’ Relationship Between Years of Education and Income RankFit OLS to sampled data We can fit a regression model to the sampled data from the simulated population, and compare the result with the “true” relationship. As you can see our modeling result is quite close to the “true” parameter values. ## sample 300 obs sample

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 2

Wenhao Jiang Part 0: Prerequisites knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(tidy = TRUE) knitr::opts_chunk$set(fig_crop = FALSE) # Load package to environment library(tidyverse) library(ggplot2) library(gridExtra) library(kableExtra) # Load csv files gapminder % ggplot(aes(x = value)) + geom_histogram(color = “black”, fill = “grey”) + labs(title = “Histogram of Simulated Population with Bernoulli Distribution”, subtitle = “N = 100000, p = 0.5”, x = “”) Histogram of Simulated Population with Bernoulli Distribution2. Sample • Sample is the data we actually observe. When we say sample, we usually mean a “random sample.” That is, the subjects chosen in the sample are randomly drawn from the population. • Sample statistics: – Sample mean: n 1 y¯ = Xyi n i=1 – Sample variance: n 2 1 X − y¯)2 s = (yi n − 1 i=1 – Sample standard deviation: √s = s2 – Standard error of the sample mean (which is the standard deviation of the mean in the sampling distribution of the mean): σ σˆy¯ = √ n • What are i.i.d. samples? – “i.i.d.” stands for “independent, identically distributed,” meaning these samples are drawn independently. That is, what you choose for your first random sample does not affect what you choose for the rest of the random samples. • (Weak) Law of Large Numbers: – This law states that with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the population mean. – n ! 1 X lim P Xi = 0 n→∞ n i=1 – Let’s see how our sample mean change as we increase our sample size from 10 to 10,000: # First create four df of random samples # Convert to tidy data object as_tibble() %>% # Add a new variable called “sample_size” that equals to the sample size mutate(sample_size = 10) sample100 % sample(size = 100, replace = FALSE) %>% as_tibble() %>% mutate(sample_size = 100) sample1000 % sample(size = 1000, replace = FALSE) %>% as_tibble() %>% mutate(sample_size = 1000) sample10000 % sample(size = 10000, replace = FALSE) %>% as_tibble() %>% mutate(sample_size = 10000) # Combine df, recode variables (rbind stands for “row bind”) sample_df % # Covert numeric variable to character variable (b/c we only have two possible outcomes) mutate(value = as.character(value)) %>% ggplot(aes(x = value, fill = value)) + geom_bar(stat = “count”, width = 0.5) + facet_wrap(~sample_size, scales = “free”) + labs(title = “Sample Distribution for Different Sample Size”) sample_size sample_mean diff_to_true_mean 10 0.6000 -0.1000 100 0.4800 0.0200 1000 0.4690 0.0310 10000 0.4958 0.0042 Sample Distribution for Different Sample Size# List sample size & sample mean sample_df %>% group_by(sample_size) %>% summarise(sample_mean = mean(value)) %>% mutate(diff_to_true_mean = 0.5 – sample_mean) %>% kbl(align = “c”) %>% kable_styling() • As you can see, by the Law of Large Numbers, our sample mean gets closer to the population mean as sample size increases. 3. Sampling Distribution of the Sample Mean • Definition: A sampling distribution describes the distribution of a statistic, such as a sample mean or variance. Because a sample statistic is itself a random variable, as we draw different samples from the population we will obtain a distribution of this sample statistic. • While there are the sampling distribution of the sample mean and the sampling distribution of sample variance, we only cover the sampling distribution of the sample mean. This concept is important because it helps us understanding the principle behind hypothesis testing, which is at core of most quantitative social science research. • The stand error of a sample’s mean is defined as the standard deviation of the sampling distribution of the mean. • The Central Limit Theorem: As sample size gets larger, the sampling distribution of the sample mean will increasingly approximate a normal distribution. This applies to population distribution of any kind.Figure 3: CLT Applies to Samping Distributions from Any Population (Agresti 5th ed. Figure 4.15) 4. Simulate the Sampling Distribution in R (for-loop): • In order to get the sampling distribution of the sample mean, we need to repeat the action of “drawing a random sample” for many times. • When we need to complete the same operation many times, we can use a for-loop. In R, you can do this using a the for-loop syntax. for (i in 1:n){ code expression of the iterative operation } for (i in 1:100){ ## draw sample } • The i in the loop is an number for indexing. Whether you use i or j or other names doesn’t matter. The 1:n indicate the number of iterations you need for the loop (You don’t always start from 1, it depends on your specific problem). Together, for(i in 1:n){…} means “for i that ranges from 1 to n, do the operation that is specified in the {}.” • For example, we can use a for-loop to repeatedly sample from the population and save the mean of each sample in a vector. Let’s try getting 100 samples with each sample n = 50. ## we create a “container” object to save the result ## it can be a vector, a matrix, a list, etc. as long as it fits your purpose mean_container % filter(lifeExp % ggplot(aes(x = year, fill = continent)) + geom_bar(position = “fill”)## dodge gapminder %>% filter(lifeExp % ggplot(aes(x = year, fill = continent)) + geom_bar(position = “dodge”)4. Scatter Plots To check the joint distribution of two numeric variables. ## scatter plot: ## relationship between GDP per capita and life expectancy gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point()## you can also add additional aesthetic mapping arguments to show group differences gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point(alpha = 0.7, size = 1.5)## let’s see how the scatter plot of log(GDP per capital) and life expectancy gapminder %>% ggplot(aes(x = log(gdpPercap), y = lifeExp, color = continent)) + geom_point(alpha = 0.7, size = 1.5)## change point size according to population size gapminder %>% ggplot(aes(x = log(gdpPercap), y = lifeExp, color = continent, size = pop)) + geom_point(alpha = 0.7) + guides(size = “none”) ## this removes the legend for size5. Line plot ## geom_line() + geom_point() are often used to plot change over time gapminder %>% filter(country == “Sweden”) %>% ggplot(aes(x = year, y = gdpPercap)) + geom_point() + geom_line()year ## you can use the “color” argument in aesthetic mapping to plot trend by group ## for example, if we want to compare GDP trend over years for North American countries: gapminder %>% filter(country %in% c(“United States”,”Mexico”,”Canada”)) %>% ggplot(aes(x = year, y = gdpPercap, color = country)) + geom_point() + geom_line()6. Fit Model Curves using geom_smooth • geom_smooth can estimate the relationship between x and y based on the model you choose to fit. • It’s useful as an exploratory tool. • It includes linear and nonlinear methods which can be useful if you want to compare the fit between linear and nonlinear assumptions. • We usually plot the smoothing on top of a scatter plot, so that we can see how well the model curve fits the data. ## for example, if we want to explore the relationship between GDP and life expectancy ## if we fit the data with linear models gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “lm”)## if we fit the data with a nonlinear assumption, ## there are various smoothing method you can choose from. See documentation for details. gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “loess”)## let’s try with the log(gdpPercap) gapminder %>% ggplot(aes(x = log(gdpPercap), y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “lm”)gapminder %>% ggplot(aes(x = log(gdpPercap), y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “loess”)7. Faceting • Faceting creates subplots based on one or more discrete variables • Faceting is useful when you want to compare or display relationships by groups in separate plots. ## for example, if we want to compare how the relationship between ## lifeExp and gdpPercap have changed from 1952 to 2007 in five continents: gapminder %>% filter(year == 1952 | year == 2007) %>% ggplot(aes(x = log(gdpPercap), y = lifeExp, color = factor(year))) + geom_point(alpha = 0.5) + facet_wrap(vars(continent))8. Arrange plots using grid.arrange() You can put together graphs using grid.arrange() from the gridExtra package. ## combine multiple plots plot1 % ggplot(aes(x = log(gdpPercap), y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “lm”) plot2 % ggplot(aes(x = log(gdpPercap), y = lifeExp)) + geom_point(shape = 1, alpha = 0.5) + geom_smooth(method = “loess”) ## grid combine grid.arrange(plot1, plot2, ncol = 2)9. Manipulate plot layout • To clearly communicate your data, please always make sure: (i) Your axes are readable, including the tick labels (ii) Your plot has a title or caption (iii) The size, shape, and color of your plot is easy to follow • You can add title and axes labels using + labs() • You can manipulate the font size, angle, and position of axes by using + theme(axis.text.x = …, axis.text.y = …) • Customize your axes’ breaks using + scale_x_continuous() or + scale_x_discrete() • You can also turn your colorful plot to greyscales by using + scale_colour_grey() for points, lines, etc. and + scale_fill_grey() for box plot, bar plot, violin plot, etc. • You can also adjust the theme by + theme_xxx() –see ggplot2 cheatsheet • There are a million things you can do to manipulate your plot. Google it or use the ggplot2 cheatsheet to discover more. ## let’s add titles and optimize plot layout for North American country GDP plot gapminder %>% filter(country %in% c(“United States”,”Mexico”,”Canada”)) %>% ggplot(aes(x = year, y = gdpPercap, color = country)) + geom_point() + geom_line() + labs(title = “GDP per capita (1952 to 2007)”, x = NULL, y = “GDP per capita”, color = “Country”) + theme_bw() + scale_color_grey() + scale_x_continuous(breaks = unique(gapminder$year)) + theme(axis.text.x = element_text(size = 8, angle = 90, vjust = 0.6)) GDP per capita (1952 to 2007)10. Save plots in R • Use ggsave() for quick saving ## to save a plot, add ggsave(filename = , plot = ) to save ## if you don’t name the plot specifically, it automatically save the last plot you’ve run ## for example, since we just ran the above plot, we can save it: ggsave(“graph/gdp_bric.png”) # You can customize the specs of the image: ggsave(“graph/gdp_bric_2.png”, height = 4, width = 6, units = c(“in”), dpi = 160) Part 4 Exercise 1. In the gapminder data, which are the top 5 countries in Europe in terms of their GDP per capita in 2002? Use dplyr functions to create a table for your result.2. Using the gapminder data, generate a table summarizing the mean, median, and standard deviation of life expectancy in Europe and Africa in 2002.

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 12

Wenhao Jiang Logistics 1.1 Overview • We use longitudinal data where individuals are observed multiple times to make inferences – Observation within-unit changes 1.2 Fixed-effects Model • FE models control for unit-specific time-invariant characteristics and use within-unit changes for estimation Yit = α + βXit + ηi + it • where ηi represents unit-specific time-invariant characteristics – We do not impose any restrictions on the relation between ηi and Xit – For example, in estimating (the existence of) male marriage premium, personality that is roughly stable but unobserved can be correlated with a man gets married and with his income. In FE model, personality is “absorbed” and controlled by ηi • We assume that the slope remains the same across units (i.e., we specify β rather than βi; for recent discussions of unit-variant slopes and new methods, see Brand and Xie (2010)), but we allow units to have their own intercepts. That is, each unit i’s intercept would be αi = α + ηi – We essentially model: Yit = αi + βXit + it • Therefore, as we already learned in previous sessions, the model can be estimated by including n − 1 unit dummy variables into our model – This is called Least Squares Dummy Variables (LSDV) estimation – This can be cumbersome if there is a large number of units. Alternatively, software such as R uses a within-estimator. Take the mean of both sides of the equation for each unit i: Y¯i = αi + βX¯i + ¯i • Take the difference of the two equations, we get: Yit − Y¯i = αi − αi + β(Xit − X¯i) + (it − ¯i) = β(Xit − X¯i) + vit • By comparing each unit’s value at time t with its mean value over time, the estimation uses only within-individual changes of both explanatory and dependent variable • By demeaning data within each unit, any time-invariant variables, such as race, cannot be estimated in an FE model • One classic example in family sociology is to determine whether marriage rewards men’s income – Comparing the income of married men and unmarried men can lead to biased estimation – Comparing the income of the same man before and after marriage using the fixed-effects model 1.3 Random-effects Model • Compared with the FE model, RE model imposes stronger assumption that is not always met • Recall when we the mean of both sides of the equation for each unit i, we get a between-estimator: Y¯i = αi + βX¯i + ¯i = α +βX¯i + ηi + ¯i constant for all units|{z} | {vzi } • RE model assumes that ηi is uncorrelated with Xit for ∀t = 1,2,…,T; we therefore get Cov(vi,X¯i) = 0 – If E[vi|X¯i] = 0, OLS estimator βˆ can be an unbiased and consistent estimator – But OLS mixes in two different “random shocks”, one that is truly stochastic across units (¯it), and the other that is specific to each unit ηi – To take into account the additional information introduced by ηi, we assume the two “random shocks” are normally distributed independently. vi = N(0,ση2) + N(0,σ2) – Therefore, another within-estimator exists: Yit − Y¯i = ηi − η¯i + β(Xit − X¯i) + (it − ¯i) = ηi − η¯i +β(Xit − X¯i) + µit | {z } 6=0 • The estimated β takes into account both within- and between-unit variations • We estimate β through MLE • By including between-unit variations, RE model allows time-invariant independent variables Part 2 Panel Data Structure • For the purpose of demonstration, this panel data is already organized in the tidy “long-format” with every person-year observation saved in each row. In addition, there is no missing data. • Load the data into your R environment. Look at the data carefully and answer the following questions – (1) Over how many time points were these individuals followed? Is this a balanced, or unbalanced panel? – (2) Among all the variables, which one(s) do you expect to be time invariant? ## load data into the environment data(wagepan, package = “wooldridge”) ## Question (1) wagepan %>% group_by(nr) %>% summarize(times = sum(!is.na(lwage))) %>% summarize(max = max(times), mean = mean(times)) ## # A tibble: 1 x 2 ## max mean ## ## 1 8 8 ## Question (2) wagepan %>% group_by(nr) %>% summarize_all(~ var(.)) %>% summarize_all(~ mean(.)) ## # A tibble: 1 x 44 ## nr year agric black bus construc ent exper fin hisp poorhlth ## ## 1 5262. 6 0.0171 0 0.0489 0.0333 0.0107 6 0.0146 0 0.0145 ## # i 33 more variables: hours , manuf , married , min , ## # nrthcen , nrtheast , occ1 , occ2 , occ3 , ## # occ4 , occ5 , occ6 , occ7 , occ8 , occ9 , ## # per , pro , pub , rur , south , educ , ## # tra , trad , union , lwage , d81 , d82 , ## # d83 , d84 , d85 , d86 , d87 , expersq Part 2 Exercise • Before estimating models, let’s create some descriptive plots for exploratory purposes. Suppose we are interested in the relationship between labor market experience and log(wage). Replicate the the plot below following the listed steps. – This plot contains two panels, with one illustrating the aggregate relationship and the other the individual-level trajectories. knitr::include_graphics(“graph/exercise1.png”) • 1. Sample ten persons from the dataset; • 2. Create an “aggregate trend” scatter plot of these individuals across all observation years with an OLS regression line for the variable exper and lwage (the upper panel); • 3. Similarly, create an “individual trend” scatter plot (the lower panel); • 4. Arrange the two plots using ggarrange();Figure 1: Graphical Demonstration of KOB Decomposition • 5. How does the relationship between exper and lwage differ in these two plots? What would be the possible reasons for the difference?Part 3: R Implementation of Panel Models • In addition to a simple bivariate relationship, we can further explore how individual wage trajectories vary by race. For the purpose of demonstration, we can sample 5 individuals from each racial group, and plot their wage trajectories. ## create a character variable “race” for plotting wagepan % mutate(race = case_when(black == 1 ~ “black”, hisp == 1 ~ “hisp”, black == 0 & hisp == 0 ~ “white”)) ## sample pid by race set.seed(123456) nr_byrace % # get a list of distinct person id number, keep other variables distinct(nr, .keep_all = T) %>% # group by race group_by(race) %>% # sample 5 persons sample_n(5) %>% ungroup() %>% # extract person id number pull(nr) ## aggregate trend fig3 % filter(nr %in% nr_byrace) %>% ggplot(aes(x = exper, y = lwage, color = race, group = race)) + geom_point() + geom_smooth(method = “lm”, se = F, size = 0.5) + labs(title = “Scatterplot with OLS Line, by Race”) + scale_colour_colorblind() + theme_bw() ## look at individual trend fig4 % filter(nr %in% nr_byrace) %>% ggplot(aes(x = exper, y = lwage, color = race, group = as.factor(nr))) + geom_point() + geom_smooth(method = “lm”, se = F, size = 0.5) + labs(title = “Scatterplot with OLS Line, by Person and Race”) + scale_colour_colorblind() + theme_bw() ## show the two figures ggarrange(fig3, fig4, ncol = 2, common.legend = TRUE, legend = “bottom”)race black hisp white • We might even run into the Simpson’s paradox, where the OLS model slope of the aggregate data is negative, whereas the OLS model slopes of the individual trends are mostly positive. • Once we look into the within-individual relationship between the outcome and the predictor, we see that a simple OLS model’s predictions are not really in line with the data. This is also expected as there is a lot of between-individual heterogeneity in the trends, which we cannot capture when we pool across all observations. 3.1 Estimating FE and RE Models in R • To estimate fixed effects and random effects models in R, we use the plm package – Another common package is lme4 – For fixed effects models, as we have mentioned earlier, you can also fit a simple linear model with “unit dummies” ∗ that is, for n unique persons, create (n − 1) dummy variables and include them in regression ∗ you can do this using as.factor(person_id) when you estimate the model • We want to estimate a model predicting mean log wage using years of working experience and race ## simple OLS model (for purpose of comparison) m_ols % ggplot() + geom_point(aes(x = exper, y = lwage), shape = 1, alpha = 0.6) + geom_line(aes(x = exper, y = yhat_ols, color = “OLS”)) + geom_line(aes(x = exper, y = yhat_fe, color = “Fixed Effects”)) + geom_line(aes(x = exper, y = yhat_re, color = “Random Effects”)) + facet_wrap(.~nr) + labs(x = “years of experience”, y = “log wage”) + scale_colour_colorblind()

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 13

Part 0: Logistics Part 1: Matching • For the following parts on causal inference, we will use the Early Childhood Longitudinal Study dataset. – race_white: Is the student white (1) or not (0)? – p5hmage: Mother’s age – w3income: Family income – p5numpla: Number of places the student has lived for at least 4 months (0)? ## import data ecls % select(one_of(ecls_cov)) %>% summarise_all(funs(mean(., na.rm = T))) ## # A tibble: 2 x 6 ## catholic race_white p5hmage w3income p5numpla w3momed_hsb ## ## 1 0 0.556 37.6 54889. 1.13 0.464 ## 2 1 0.725 39.6 82074. 1.09 0.227 ## Two sample t-test for every covariate ## lapply: a build-in loop that apply the t-test function along the name vector lapply(ecls_cov, function(v){ t.test(ecls[, v] ~ ecls[, ‘catholic’]) }) ## [[1]] ## ## Welch Two Sample t-test ## ## data: ecls[, v] by ecls[, “catholic”] ## t = -13.453, df = 2143.3, p-value < 0.00000000000000022 ## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0 ## 95 percent confidence interval: ## -0.1936817 -0.1444003 ## sample estimates: ## mean in group 0 mean in group 1 ## 0.5561246 0.7251656 ## ## ## [[2]] ## ## Welch Two Sample t-test ## ## data: ecls[, v] by ecls[, “catholic”] ## t = -12.665, df = 2186.9, p-value < 0.00000000000000022 ## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0 ## 95 percent confidence interval: ## -2.326071 -1.702317 ## sample estimates: ## mean in group 0 mean in group 1 ## 37.56097 39.57516 ## ## ## [[3]] ## ## Welch Two Sample t-test ## ## data: ecls[, v] by ecls[, “catholic”] ## t = -20.25, df = 1825.1, p-value < 0.00000000000000022 ## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0 ## 95 percent confidence interval: ## -29818.10 -24552.18 ## sample estimates: ## mean in group 0 mean in group 1 ## 54889.16 82074.30 ## ## ## [[4]] ## ## Welch Two Sample t-test ## ## data: ecls[, v] by ecls[, “catholic”] ## t = 4.2458, df = 2233.7, p-value = 0.00002267 ## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0 ## 95 percent confidence interval: ## 0.02150833 0.05842896 ## sample estimates: ## mean in group 0 mean in group 1 ## 1.132669 1.092701 ## ## ## [[5]] ## ## Welch Two Sample t-test ## ## data: ecls[, v] by ecls[, “catholic”] ## t = 18.855, df = 2107.3, p-value < 0.00000000000000022 ## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0 ## 95 percent confidence interval: ## 0.2122471 0.2615226 ## sample estimates: ## mean in group 0 mean in group 1 ## 0.4640918 0.2272069 Choose and execute a matching algorithm • To create a balanced sample from the original, unbalanced dataset, we need to choose and execute a matching algorithm in order to created a balanced dataset to estimate ATE. The package MatchIt estimates the propensity score in the background and then matches observations based on the method of your choice. • In this example we use nearest neighbor matching, which matches units based on some measure of distance. The default and most common measure is the propensity score difference, which is the difference between the propensity scores of each treated and control unit. ## MatchIt does not allow missing values, so we need to remove observations with NAs ecls_nomiss % select(c5r2mtsc_std, catholic, all_of(ecls_cov)) %>% na.omit() ## nearest neighbor matching (see documentation for different matching methods) mod_match

$25.00 View

[SOLVED] Soc-ga 2332 intro to stats lab 11

Wenhao Jiang Part 1: Bi-variate Associations (Contingency Tables) For today, we will use a similar dataset about same-sex marriage support. But now we have three support levels (1 = Oppose, 2 = Neutral, 3 = Support) instead of a binary outcome. In R, you can create a contingency table by using the table() function and input the two categorical variables you are interested in. To conduct a chi-square test of independence, simply use the function chisq.test(your_contingency_table). ## create variables for contingency tables support_df % mutate(## convert dummies to categorical variables gender = ifelse(female == 0, “male”, “female”), race = ifelse(black == 1, “black”, “white”)) ## simple contingency table and chi-square test for support levels and race t1

$25.00 View

[SOLVED] Ecse324 – lab 2: stacks, subroutines, and c

ECSE 324 – Computer Organization Introduction In this lab, you will learn how to use subroutines and the stack, program in C, and call code written in assembly from code written in C. 1 Subroutines 1.1 The stack The stack is a data structure which can be helpful for situations when there are not enough registers for a program to use only registers to store data. You will also need to make use of the stack when calling subroutines to save the state of the code outside of the subroutine. Review how the PUSH and POP instructions work. Note that pushing and popping can be implemented without using the PUSH and POP instructions by using other ARM instructions. Rewrite the following PUSH and POP instructions using only other instructions: • PUSH {R0} • POP {R0 – R2} 1.2 The subroutine calling convention The convention which we will use for calling a subroutine in ARM assembly is as follows. The caller must: • Move arguments into R0 through R3. (If more than four arguments are required, the caller should push the arguments onto the stack.) • Call the subroutine using BL The callee must • Move the return value into R0 • Ensure that the state of the processor is restored to what it was before the subroutine call • Use BX LR to return to the calling code (The state can be saved and restored by pushing R4 through LR onto the stack at the beginning of the subroutine and popping R4 through LR off the stack at the end of the subroutine.) Convert your program from Lab 1 for finding the max of an array into a program which uses a subroutine. The subroutine should return the max in R0. 1.3 Fibonacci calculation using recursive subroutine calls A recursive subroutine is a subroutine which calls itself. You can calculate the nth Fibonacci number, Fn (where F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, . . . ), using a recursive subroutine as follows: Fib (n ) : if n >= 2: return Fib (n−1) + Fib (n−2) if n < 2: return 1 For example, F4 is computed as follows: Fib(4) = Fib(3)+ Fib(2) = (Fib(2)+ Fib(1))+(Fib(1)+ Fib(0)) = ((Fib(1)+ Fib(0))+1)+(1+1) = (1+1+1)+(1+1) = 5 Write an assembly program which computes the nth Fibonacci number in this way. Your program should have a main section which calls the Fibonacci subroutine recursively for the above pseudocode.Figure 1: C code for computing the max. 2 C Programming Assembly language is useful for writing fast, low-level code, but it can be tedious to work with. Often, high-level languages like C are used instead. 2.1 Pure C We will first go through an example of programming in straight C. • Create a new project, performing the same steps as you performed for an assembly project. However, when the New Project Wizard asks what program type you would like, select “C Program”. Click the box next to “Include a sample program with the project”, and select the “Getting Started” program. • Delete all the code in “getting started.c” and replace it with the incomplete C program shown in Figure 1. • Fill in the code with a for-loop which iterates through the array to find the maximum. • Compile and run the C program the same way you compile and run assembly programs. Notice that the disassembly viewer shows how the compiler has translated C into assembly. 2.2 Calling an assembly subroutine from C It is also possible to mix C and assembly. You will need to do this from Lab 3 onward. Perform the following steps to write a C program which calls an assembly subroutine. • Create a new C project, as you did in Section 2.1. • Add a file to your project called “subroutine.s”. (The filename does not matter, but it should have the .s extension.) • Copy the code from Figure 2 into “subroutine.s”. This code computes the maximum of two numbers and returns the result. Notice that the subroutine does not bother to save and restore the caller state. This is sometimes OK to do, in subroutines which do not change the state. • Next, edit your C program so that it contains the code in Figure 3. This code uses the assembly subroutine to compute the max of two numbers. • Compile and run the program. Find the main section and the MAX 2 section, and put breakpoints to see the processor run those sections. • Finally, rewrite your C program to find the max of a list using the MAX 2 subroutine.Figure 2: Assembly code with MAX 2 subroutine.Figure 3: C code which calls MAX 2 subroutine. 3 Grading • Test program with rewritten PUSH and POP instructions (15%) • Assembly code which computes the max of an array using an assembly subroutine (15%) • Fibonacci program with recursive subroutine (20%) • C code which computes the max of an array using C (15%) • C code which computes the max of an array using an assembly subroutine (15%) Finally, the remaining 20% of the grade for this Lab will go towards a report. Write up a short (3-4) page report that gives a brief description of each part completed, the approach taken, and the challenges faced, if any. Please don’t include the entire code in the body of the report. Save the space for elaborating on possible improvements you made or could have made to the program. Your final submission should be a single compressed folder that contains your report and all the code files (.c and .s).

$25.00 View

[SOLVED] Ecse324 – lab 3: basic i/o, timers and interrupts

ECSE 324 – Computer Organization Introduction This lab introduces the basic I/O capabilities of the DE1-SoC computer – the slider switches, pushbuttons, LEDs and 7-Segment displays. After writing assembly drivers that interface with the I/O components, timers and interrupts are used to demonstrate polling and interrupt based applications written in C. 1 Creating the Project in the Altera Monitor Program IMPORTANT: The project is structured as outlined below to introduce concepts that are used in writing well organized code. Furthermore, drivers for configuring the Generic Interrupt Controller (GIC) will be provided in the latter part of this lab, and the driver code relies on this project structure. The code will not compile if the project is not organized as described! First, create a new folder named GXX Lab3, where GXX is the corresponding group number. Within this folder, create a new folder named drivers. Finally, within the drivers folder, create three folders: asm, src and inc. The final folder structure is shown in Figure 1. GXX Lab3 drivers asm inc src Figure 1: The project folder structure Create a new file main.c and save it in the GXX Lab3 folder. Next, open the Altera Monitor Program and create a new project. Select the created folder GXX Lab3 as the project directory, name the project GXX Lab3, set the architecture to ARM Cortex-A9 and click ‘Next’. When asked to select a system, select De1-SoC Computer from the drop-down menu and click ‘Next’. Set the program type as C Program and click ‘Next’. In the next menu, add main.c to the source files. In the System Parameters menu, ensure that the board is detected in the ‘Host connection’ dialogue box and click ‘Next’. Finally, in the memory settings menu, change the Linker Section Presents from ‘Basic’ to ’Exceptions’ and click ‘Finish’. 2 Basic I/O For this part, it is necessary to refer to sections 2.5.6 – 2.5.10 (pp. 8 – 10) and 3.4 (pp. 20 – 21) in the De1-SoC Computer Manual. Brief overview The hardware setup of the I/O components is fairly simple to understand. The ARM cores have designated addresses in memory that are connected to hardware circuits on the FPGA, and these hardware circuits in turn interface with the physical I/O components. In the case of most of the basic I/O, the FPGA hardware can be as simple as a direct mapping from the I/O terminals to the memory address designated to it. For instance, the state of the slider switches is available to the FPGA on bus of 10 wires which carry either a logical ’0’ or ’1’. This bus can be directly passed as ’write-data’ to the memory address reserved for the slider switches (0xFF200040 in this case). It is useful to have slightly more sophisticated FPGA hardware. For instance, in the case of the push-buttons, in addition to knowing the state of the button it is also helpful to know whether a falling edge is detected, signalling a keypress. This can be achieved by a simple edge detection circuit in the FPGA. The FPGA hardware to interface with the I/O is part of the De1-SoC computer, and is loaded when the .sof file is flashed onto the board. This section will deal with writing assembly code to control the I/O interact by reading from and writing to memory. Getting started: Drivers for slider switches and LEDs • Slider switches: Create a new assembly file called slider switches.s in the GXX Lab3/drivers/asm directory. Create a new subroutine labelled read slider switches ASM, which will read the value at the memory location designated for the slider switches data into the R0 register, and then branch to the link register. Make the subroutine visible to other files in the project by using the .global assembler directive. Remember to use the ARM function calling convention, and save the context if needed!. Next, create a new header file called slider switches.h in the GXX Lab3/drivers/inc directory. The header file will provide the C function declaration for the slider switches assembly driver. Declare the function as extern int read slider switches ASM(), and make use of preprocessor directives to avoid recursive inclusion of the header file. To help get started, code for the slider switches driver has been provided in Figure 2. Use this as a template for writing future driver code. • LEDs: Create a new assembly file called LEDs.s in the GXX Lab3/drivers/asm directory. Create two subroutines – read LEDs ASM and write LEDs ASM. Again, export both subroutines using the .global assembler directive Similar to the slider switches driver, the read LEDs ASM subroutine will load the value at the LEDs memory location into R0 and then branch to LR. The write LEDs ASM subroutine will store the value in R0 at the LEDs memory location, and then branch to LR. Create a new header file called LEDs.h in the GXX Lab3/drivers/inc directory. Provide function declarations for both the subroutines. The function declaration will not be the exact same as in the slider switches; one of these functions will have to accept an argument!(a) Assembly file (b) Header file Figure 2: Code for the slider switches driver • Putting it together: Fill in the main.c file in the GXX Lab3 directory. The main function will include the header files for both the drivers, and will send the switches state to the LEDs in an infinite while loop. The code for this file is shown in Figure 3.Figure 3: Code for the main.c file Next, open the project settings and add all the driver files to the project. Compile and load the project onto the De1-SoC computer, and run the code. The LED lights should now turn on and off when the corresponding slider switch is toggled. Slightly more advanced: Drivers for HEX displays and push-buttons Now that the basic structure of the drivers has been introduced, custom data types in C will be used to write drivers that are more readable and easier to implement. In particular, the following two drivers will focus on using enumerations in C. • HEX displays: As in the previous parts, create two files HEX displays.s and HEX displays.h and place them in the correct folders. The code for the header file is provided in Figure 4. Notice the new datatype HEX t defined in the form of an enumeration, where each display is given a unique value based on a one-hot encoding scheme. This will be useful when writing to multiple displays in the same function call. HEX clear ASM will turn off all the segments of all the HEX displays passed in the argument. Similarly, HEX flood ASM will turn on all the segments. The final function HEX write ASMFigure 4: Code for the HEX displays.h file takes a second argument val, which is a number between 0-15. Based on this number, the subroutine will display the corresponding hexadecimal digit (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) on the display(s). A sample program is shown in Figure 5 to demonstrate how multiple displays can be controlled in the same function call. Since the value for each display is based on a one-hot encoding scheme, the logical OR of all the displays will assert the bits for all the displays.Figure 5: Sample program that uses the HEX displays driver • Pushbuttons: Create two files pushbuttons.s and pushbuttons.h and place them in the correct folders. Write the assembly code to implement the functionality described in the header file, as shown in Figure 6 • Putting it together: Modify the main.c file to create an application that uses all of the drivers created so far. As before, the state of the slider switches will be mapped directly to the LEDs. Additionally, the state of the last four slider switches SW3-SW0 will be used to set the value of a number from 0-15. This number will be displayed on a HEX display when the corresponding pushbutton is pressed. For example. pressing KEY0 will result in the number being displayed on HEX0. Since there are no pushbuttons to correspond to HEX4 and HEX5, switch on all the segments of these two displays. Finally, asserting slider switch SW9 should clear all the HEX displays.Figure 6: Code for the pushbuttons.h file 3 Timers For this part, it is necessary to refer to sections 2.4.2 (p. 4) and 3.2 (p. 20) in the De1-SoC Computer Manual. Brief introduction Timers are simply hardware counters that are used to measure time and/or synchronize events. They run on a known clock frequency that is programmable in some cases (by using a phase-locked loop). Timers are usually (but not always) down counters, and by programming the start value, the time-out event (when the counter reaches zero) occurs at fixed time intervals. HPS timer drivers Create two files HPS TIM.s and HPS TIM.h and place them in the correct folders. The code for the header file is shown in Figure 7.Figure 7: Code for the HPS TIM.h file This driver uses a new concept in C – structures. A structure is a composite datatype that allows the grouping of several variables to be accessed by a single pointer. They are similar to arrays, except that the individual elements of a structure can be of different datatypes! Writing this driver will help demonstrate how structures can be useful by modifying multiple parameters easily. Notice how the first subroutine HPS TIM config ASM takes a struct pointer as an argument. The reason for this is that if a struct is passed directly to a function, the compiler unpacks the struct elements at compile time and passes them as individual arguments to the function. Since in most cases the number of arguments will be greater than the number of argument registers, the compiler will place the extra arguments on the stack. This is perfectly fine if all the code is handled by the compiler, but since this lab requires handwritten assembly drivers, it causes the programmer a lot of extra overhead when retrieving the arguments in the assembly subroutine. By passing a struct pointer, the individual elements can be easily accessed at the corresponding offset from the base address passed in the pointer. For instance, the timeout element can be accessed in the assembly subroutine via a load instruction from the address in R0 offset by 0x4. Implement assembly subroutines for the three functions shown in the header file. The second subroutine HPR TIM read INT ASM need not support multiple timer instances passed in the argument, but if it does then the return value should be set appropriately to reflect the S-bit value of all the timers present in the argument. The other two subroutines should be able to handle multiple timers. Notice how the timeout struct element is given in microseconds. This hides the hardware specific details of the timer from the C programmer. Since all of the HPS timers do not run on the same clock frequency, the subroutine must calculate the correct load value for the corresponding timer in order to achieve the desired timeout value. IMPORTANT: In the HPS TIM config ASM subroutine, each timer should first be disabled before writing any of the other configuration parameters. The value in the enable parameter can then be written last. IMPORTANT: The De1-SoC computer manual has omitted to mention that the S-bit in the interrupt status register will be asserted only if the I-bit in the control register is set to 0 (in order to unmask the timeout event)! A sample program that uses the HPS timer driver is shown in Figure 8. Notice how all four HPS timers are configured to have a 1 second timeout in the same function call. If the assembly driver functions correctly, the program will count from 0-15 on all four HEX displays at the same rate of 1 second. It is important to remember that the configuration values in the struct are implemented at a level of abstraction above the hardware, with the aim of providing a better hardware interface to the C programmer. How these values are then used in the assembly driver should be governed by the hardware documentation (De1-SoC computer manual). Creating an application: Stopwatch! Create a simple stopwatch using the HPS timers, pushbuttons, and HEX displays. The stopwatch should be able to count in increments of 10 milliseconds. Use a single HPS timer to count time. Display milliseconds on HEX1-0, seconds on HEX3-2, and minutes on HEX5-4. PB0, PB1, and PB2 will be used to start, stop and reset the stopwatch respectively. Use another HPS timer set at a faster timeout value (5 milliseconds or less) to poll the pushbutton edgecapture register.Figure 8: Sample program that uses the HPS timer driver 4 Interrupts For this part, it is necessary to refer to section 3 (pp. 19-32) in the De1-SoC Computer Manual. Additional information about the interrupt drivers that are provided can be found in ’Using the ARM Generic Interrupt Controller’ which is available on myCourses. Brief introduction Interrupts are hardware or software signals that are sent to the processor to indicate that an event has occurred that needs immediate attention. When the processor receives an interrupt, it pauses the current code execution, handles the interrupt by executing code defined in an Interrupt Service Routine (ISR), and then resumes normal execution. Using the interrupt drivers Download the following files from myCourses: • intsetup.c • int setup.h • ISRs.s • ISRs.h • address map arm.h Within the GXX Lab3/drivers/ directory, place C files in the src, header files in the inc, and assembly files in the asm directories. Only the ISRs.s and ISRs.h files will need to be modified in applications. Do not modify the other files Before attempting this section, get familiarized with the relevant documentation sections provided in the introduction. To demonstrate how to use the drivers, a simple interrupt based application using HPS TIM0 is shown. Note: Ensure that in the memory settings menu in the project settings, the Linker Section Presets has been changed from ‘Basic’ to ’Exceptions’! To begin, the code for the main.c file is shown in Figure 9. The int setup() function is the only thing needed to configure the interrupt controller and enable the desired interrupt IDs. It takes two arguments: an integer whose value denotes the number of interrupt IDs to enable, and an integer array containing these IDs. In this example, the only interrupt ID enabled is 199, corresponding to HPS TIM0. After enabling interrupts for the desired IDs, the hardware devices themselves have to be programmed to generate interrupts. This is done in the code above via the HPS timer driver. Instructions for enabling interrupts from the different hardware devices can be found in the documentation.Figure 9: Interrupts example: The main.c file Now that HPS TIM0 is able to send interrupts, ISR code is needed to handle the interrupt events. Notice how in the while loop of the main program, the value of hps tim0 int flag is checked to see if an interrupt has occurred. The ISR code is responsible for writing to this flag, and also for clearing the interrupt status in HPS TIM0. When interrupts from a device are enabled and an interrupt is received, the processor halts code execution and branches to the appropriate subroutine in the ISRs.s file. This is where the ISR code should be written. Figure 10 shows the ISR code for HPS TIM0. In the ISR, the interrupt status of the timer is cleared, and the interrupt flag is asserted. Finally, in order for the main program to use the interrupt flag, it is declared in the ISRs.h file as shown in Figure 11. IMPORTANT: When ISR code is being executed, the processor has halted normal execution. Lengthy ISR code will cause the application to freeze. ISR code should be as lightweight as possible! Interrupt based stopwatch! Modify the stopwatch application from the previous section to use interrupts. In particular, enable interrupts for the HPR timer used to count time for the stopwatch. Also enable interrupts for the pushbuttons, and determine which key was pressed when a pushbutton interrupt is received. There is no need for the second HPS timer that was used to poll the pushbuttons in the previous section.Figure 10: Interrupts example: The ISR assembly codeFigure 11: Interrupts example: Flag declaration in ISRs.h 5 Grading • Slider switches and LEDs program (10%) • Entire basic I/O program (15%) • Polling based stopwatch (30%) • Interrupt based stopwatch (25%) Finally, the remaining 20% of the grade for this Lab will go towards a report. Write up a short (3-4) page report that gives a brief description of each part completed, the approach taken, and the challenges faced, if any. Please don’t include the entire code in the body of the report. Save the space for elaborating on possible improvements you made or could have made to the program. Your final submission should be a single compressed folder that contains your report and all the code files, correctly organized (.c, .h and .s).

$25.00 View

[SOLVED] Csed342 – assignment 8. from language to logic

CSED342 – Artificial Intelligence General Instructions This assignment has a written part and a programming part. Ò This icon means a written answer is expected in writeup.pdf. Refer to writeup.tex for pdf file generation. ¥ This icon means you should write code in submission.py. You should modify the code in submission.py between # BEGIN_YOUR_CODE and # END_YOUR_CODE but you can add other helper functions outside this block if you want. Do not make changes to files other than submission.py. Your code will be evaluated based solely on hidden test cases, which you can see in grader.py. This is because if you have followed the instructions for each problem correctly, the answer will be deterministic. To run all the tests, type python grader.py This will tell you only whether you passed the hidden tests. On the hidden tests, the script will alert you if your code takes too long or crashes, but does not say whether you got the correct output. You can also run a single test (e.g., 1a-1-hidden) by typing python grader.py 1a-1-hidden We strongly encourage you to read and understand the test cases, create your own test cases, and not just blindly run grader.py.In this assignment, you will get some hands-on experience with logic. You’ll see how logic can be used to represent the meaning of natural language sentences, and how it can be used to solve puzzles and prove theorems. Most of this assignment will be translating English into logical formulas, but in Problem 2, we will delve into the mechanics of logical inference. To get started, launch a Python shell and try typing the following commands to add logical expressions into the knowledge base. from logic import * Rain = Atom(“Rain”) # Shortcut Wet = Atom(“Wet”) # Shortcut kb = createResolutionKB() # Create the knowledge base kb.ask(Wet) # Prints “I don’t know.” kb.ask(Not(Wet)) # Prints “I don’t know.” kb.tell(Implies(Rain, Wet)) # Prints “I learned something.” kb.ask(Wet) # Prints “I don’t know.” kb.tell(Rain) # Prints “I learned something.” kb.tell(Wet) # Prints “I already knew that.” kb.ask(Wet) # Prints “Yes.” kb.ask(Not(Wet)) # Prints “No.” kb.tell(Not(Wet)) # Prints “I don’t buy that.” To print out the contents of the knowledge base, you can call kb.dump(). For the example above, you get: ==== Knowledge base [3 derivations] === * Or(Not(Rain),Wet) * Rain – Wet In the output, ‘*’ means the fact was explicitly added by the user, and ‘-’ means that it was inferred. Here is a table that describes how logical formulas are represented in code. Use it as a reference guide: Name Mathematical notation Code Atomic formula (atom) Rain LocatedIn(postech,x) Atom(“Rain”) (predicate must be uppercase) Atom(“LocatedIn”, “postech”, “$x”) (arguments are symbols) Negation ¬Rule Not(Atom(“Rain”)) Conjunction Rain ∧ Snow And(Atom(“Rain”), Atom(“Snow”)) Disjunction Rain ∨ Snow Or(Atom(“Rain”), Atom(“Snow”)) Implication Rain → Wet Implies(Atom(“Rain”), Atom(“Wet”)) Equivalence Rain ↔ Wet (syntactic sugar for Rain → Wet ∧ Wet → Rain) Equiv(Atom(“Rain”), Atom(“Wet”)) The operations And and Or only take two arguments. If we want to take a conjunction or disjunction of more than two, use AndList and OrList. For example: AndList([Atom(“A”), Atom(“B”), Atom(“C”)]) is equivalent to And(And(Atom(“A”), Atom(“B”)), Atom(“C”)). Problem 1. Propositional logic Write a propositional logic formula for each of the following English sentences in the given function in submission.py. For example, if the sentence is “If it is raining, it is wet,” then you would write Implies(Atom(“Rain”), Atom(“Wet”)), which would be Rain → Wet in symbols (see examples.py). Note: Don’t forget to return the constructed formula! Problem 1a [2 points] ¥ “If it’s summer and we’re in California, then it doesn’t rain.” Problem 1b [2 points] ¥ “It’s wet if and only if it is raining or the sprinklers are on.” Problem 1c [2 points] ¥ “Either it’s day or night (but not both).” Problem 1d [2 points] ¥ “One can access campus server only if she is a computer science major or not a freshman.” Problem 1e [3 points] ¥ “There are 10 students (i.e. student1, …, student10) and they all pass AI course.” Problem 2. Logical Inference Having obtained some intuition on how to construct formulas, we will now perform logical inference to derive new formulas from old ones. Also, we will use conjunctive normal form (CNF). A CNF formula is a conjunction of clauses. Example: (A ∨ B ∨¬C) ∧ (¬B ∨ D)). Every formula f in propositional logic can be converted into an equivalent CNF formula f′ : M(f) = M(f′). Some conversion rules are below: • Modus Ponens: • Resolution: • Eliminate • Eliminate • Eliminate • Move ¬ inwards: • Move ¬ inwards: • Eliminate double negation: • Distribute ∨ over Problem 2a [5 points] Ò Some inferences that might look like they’re outside the scope of Modus ponens are actually within reach. Suppose the knowledge base contains the following two formulas: KB = {(A ∨ B) →¬C,¬(¬A ∨ C) → D,A}. Your task: First, convert the knowledge base into conjunctive normal form (CNF). Then apply Modus ponens to derive D. Please show how your knowledge base changes as you apply derivation rules. Remember, this isn’t about you as a human being able to arrive at the conclusion, but rather about the rote application of a small set of transformations (which a computer could execute). Problem 2b [5 points] Ò Recall that Modus ponens is not complete, meaning that we can’t use it to derive everything that’s true. Suppose the knowledge base contains the following formulas: KB = {A ∨ B,B → C,(A ∨ C) → D}. In this example, Modus ponens cannot be used to derive D, even though D is entailed by the knowledge base. However, recall that the resolution rule is complete. Your task: Convert the knowledge base into CNF and apply the resolution rule repeatedly to derive D.

$25.00 View