Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Ece486 lab 3: digital simulation of a closed loop system

Scope and Objective 1. Study a closed loop system for speed control of a belt driven system through digital simulation ◼Model the system of a magnetic tape drive such as a cassette player as an example 2. Implement different controllers and evaluated their effectiveness for different design objectives ◼Achieving disturbance rejection and a smooth step response simultaneously Introduction Our previous lab activities have so far equipped us with the skills for modeling and analysis of the system dynamics. We will now move towards designing solutions that achieve specific desired outcome in terms of system(dynamics) behavior of interest. A closed loop system with feedback control, as you have seen in lectures, could facilitate such design goals. In this lab we will look at the example in controlling the speed of a belt driven system using the cassette player as an example.Reading G. F. Franklin, et al., Feedback Control of Dynamic Systems, 3rd or 4th Ed., Sec. 4.1, 4.1.1. These pages can be found in Course Blackboard. In 7th or 6th Ed., see Sec. 4.2.2 for distance rejection. The following contents are from ECE486 Control Systems Lab Manual, Lab Session 3 Digital Simulation of a Closed Loop System In this experiment Simulink is used to study a closed loop system for speed control of a magnetic tape drive such as a cassette player. You will compare different controllers for their effectiveness in simultaneously accomplishing disturbance rejection and a smooth step response. Z Preparation Y Readings: . G. F. Franklin, et al., Feedback Control of Dynamic Systems, 3rd or 4th Ed., Sec. 4.1, 4.1.1. These pages are copied for you and attached at the end of this manual. In 7th or 6th Ed., see Sec. 4.2.2 for distance rejection. Prelab: Modeling: A diagram of the system is shown belowFigure3.1. : Closed-Loop System with Unity Feedback. The tape dynamics are modeled as a rotational mass-damper system: Jω˙(t) + Bω(t) = τ(t) + τd(t), where J = 0.25[kg · m2], B = 3[N · m · s]. The torque motor transfer function is determined by an open-loop test with the motor detached from the system. A one-volt step is applied (zero initial conditions), and the resulting motor torque is τ(t) = 5(1 − e−3t), t > 0. The amplifier is a non-inverting gain, K. It is assumed that the tape speed sensor (tachometer) dynamics can be neglected. 24 3 Digital Simulation of a Closed Loop System Design: We want to have a closed loop system whose steady-state response from a disturbance is small, and whose step response from a reference input has low overshoot. (a) Obtain an all-integrator block diagram for the system in figure 3.1. Show theintermediate variables (τ, v etc.). Note that the motor torque is the convolution of a step input with the motor’s transfer function. (b) For the closed-loop system, find the transfer functions Ω(s)/Ωr(s) (with τd = 0) and Ω(s)/Td(s) (with ωr = 0). Use numerical values where given. (Hint: remember to make your lead coefficient 1.) (c) For what values of K is the closed loop system stable? (e) Express the desired value of Kr, in terms of K, such that the steady-state output speed for a unit step in ωr is 1. (τd = 0) (f) Using the minimum value of K that you found in (d), what are the values of the closed loop ζ and ωn? Consider the anticipated response ω(t) to a step at ωr (τd = 0). What will be the expected ts, tr, and Mp? Use the equations on page 13. (g) Controller 2: Suppose, alternatively, that K is selected to produce a low overshoot (ζ > 0.75). Find the range of K that meets this specification. If K is selected such that ζ = 0.75, what will be the steady-state disturbance response to a unit step? What Mp do we achieve? Can we achieve both our desired disturbance rejection and our desired overshoot? (h) Controller 3: To improve the response, suppose we add derivative feedback (measured by an accelerometer in this case) as shown:Figure3.2. : Closed-Loop System with Derivative Feedback. Compute again the new closed-loop transfer functions Ω(s)/Ωr(s) (with τd = 0) and Ω(s)/Td(s) (with ωr = 0). Give the ranges of K and K·Kd for which the closed-loop system is stable. 25 f Laboratory Exercise f 1. Develop an all-integrator Simulink block-diagram for the three closed-loop control systems. It is a good simulation practice to retain explicitly in the simulation the various subsystems – actuator, plant, amplifiers, etc. Your diagram should do so, and you should label the inputs and outputs, ω, τ, v, e, of the subsystems. It is possible to make only one diagram, and simply use Matlab variables in the Simulink blocks for the different cases. Do not use differentiators in your diagram. Instead, route the required states around the integrators. See appendix A, section IV-c for an example of this. 2. Obtain and print the following time responses for each of the three controllers: • ω(t) for the disturbance response to a unit step in τd (ωr = 0). For comparison, overlay and label the plots for the different controllers. • ω(t) for the reference response to a unit step in ωr (τd = 0). Remember to use the appropriate value of Kr for each controller. Overlay again for comparison. You should end up with six graphs on two plots. V Report V 1. Include your prelab calculated values and experimental data (time responses) obtained. Compare the values of Mp, tr, and ts as calculated in the prelab with the values obtained from Matlab. Which controllers met the specifications (steady state disturbance response 6 0.01 rad/s and ζ > 0.75)? 2. For the system in Figure 3.1, derive the relationship between the steady-stateerror (ess = ωr − ω) and the natural frequency, ωn. Consider the error as a function of both ωr and τd, when both are step inputs. Since the system is linear, superposition allows the two components of ess to be calculated separately and then summed. 3. For controller 3, solve for ζ and ωn in terms of the gains K and Kd. From these equations, sketch a plot of how the pole locations change as Kd > 0 increases in value. 26 3 Digital Simulation of a Closed Loop System

$25.00 View

[SOLVED] Ece 486 homework02: root locus

Problem 1 Consider the following transfer functions: 1) 2) For each one of these, do the following: a) Mark the zeros and poles on the s-plane and use Rule 2 from class to plot the real-axis part of the root locus. b) Use the phase condition from class to test whether or not the point s = j is on the root locus. If you run into “non-obvious” angles, estimate rather than calculate them, this should be enough. c) Apply Rules 3 and 4 to determine asymptotes and departure and arrival angles. Plot the root locusbranches based on this information. d) Apply Rule 5 to determine imaginary-axis crossings (if any), and complete the (positive) root locus byusing Rule 6 to check for multiple roots. e) Plot the (positive) root locus using the MATLAB rlocus command or equivalent in Python or other language. Turn in your MATLAB (or equivalent) plots as well as hand sketches of root loci along with all accompanying calculations and explanations. Problem 2 Consider the transfer function a) Plot by hand the negative (K < 0) root locus for L(s), using Rules 1–6 for negative root loci. Make your root locus as explicit as possible by specifying (when applicable) the real-axis part, asymptotes, arrival and departure angles, imaginary-axis crossings, and points of multiple roots. Turn in the hand plot and accompanying calculations and explanations. b) Plot the same root locus in MATLAB (or equivalent); turn in the plots. Page 1

$25.00 View

[SOLVED] Ece486 –

Reading: FPE (Franklin, Powell, Emami-Naeini, 6th or 7th edition), Sections Sections 3.1 and 3.2. Sections 3.3–3.6. Problems: 1. Using techniques for block diagram reduction discussed in class, find the transfer functions of the systems shown below (p156 from the textbook, 3rd edition)(a) (b) 2. Consider the following state-space model (so-called “observer canonical form”): . Build an all-integrator diagram for this system. 3. Consider the plant with transfer function where K is a positive parameter you can tune. a) Consider the settling time spec ts ≤ 4. Give some value (or range of values) of K for which the system meets this spec. Justify your choice. b) Consider the rise time spec tr ≤ 1. Give some value (or range of values) of K for which the system meets this spec. c) Consider the overshoot spec Mp ≤ 0.1. Give some value (or range of values) of K for which the system meets this spec. Justify your choice.

$25.00 View

[SOLVED] Ece438 – machine problem 3

Abstract This machine problem tests your understanding of the distance vector and the linkstate routing algorithms. 1 Introduction In this MP, you will implement the link state and distance vector routing protocols. You will write two separate programs: one that implements the link state protocol, and one that implements the distance vector protocol. Both programs will read the same file formats to get the network’s topology and what messages to send. 2 Router Programs Your program should contain a collection of imaginary routing nodes that carry out their routing protocol (link state or distance vector, for the corresponding program). These imaginary nodes are just data-structures in your program. There is no socket programming involved. Once their tables have converged, for each node (in ascending order of node ID), write out the node’s forwarding table (see “Output format” section for details). Then, have some of your nodes send some data to some other nodes, with the data forwarded according to the nodes’ forwarding tables. The sources, destinations, and message contents are specified in the message file; see below for format. Then, one at a time, apply each line in the topology changes file (see below) in order, and repeat the previous instructions after each change. The nodes in your distance vector, link state program should use the DV, LS algorithm to arrive at a correct forwarding table for the network they’re in. 3 Tie breaking We would like everyone to have consistent output even on complex topologies, so we ask you to follow specific tie-breaking rules. 1. Distance Vector Routing: when two equally good paths are available, your node should choose the one whose next-hop node ID is lower. 2. Link State: When choosing which node to move to the finished set next, if there is a tie, choose the lowest node ID. 3. If a current-best-known path and newly found path are equal in cost, choose the path whose last node before the destination has the smaller ID. Example: source is 1, and the current-best-known path to 9 is 1 → 4 → 12 → 9. We are currently adding node 10 to the finished set. 1 → 4 → 5 → 10 → 9 costs the same as path 1 → 4 → 12 → 9. We will switch to the new path, since 101, and ”this one gets sent from 3 to 5!” from 3->5. Note that node IDs could go to double digits. Example changes file: 2 4 1 2 4 -999 This would add a cost 1 link between 2 and 4, and then remove it afterwards. 5 Output Format Write all output described in this section to a file called “output.txt”. The forwarding table format should be: destination nexthop pathcost where nexthop is the neighbor we hand destination’s packets to, and pathcost is the total cost of this path to destination. The table should be sorted by destination. Example for node 2 from the example topology: 1 5 6 2 2 0 3 3 3 4 5 5 5 5 4 As you can see, the node’s entry for itself should list the nexthop as itself, and the cost as 0. If a destination is not reachable, do not print its entry. That’s one single space in between each number, with each row on its own line. Remember, you’re printing all nodes’ tables at once. So, the example for node 2 would have been preceded by a similarly formatted table for node 1, and followed by the tables of 3, 4, and 5. When a message is to be sent, print the source, destination, path cost, path taken (including the source, but NOT the destination node), and message contents in the following format: “from to cost hops message ” e.g. : “from 2 to 1 cost 6 hops 2 5 4 message here is a message from 2 to 1” Print messages in the order they were specified in the messages file. If the destination is not reachable, please say “from to cost infinite hops unreachable message ” Please do not print anything else; any diagnostic messages or the like should be commented out before submission. However, if you want to organize the output a little, it’s okay to print as many blank lines as you want in between lines of output. Both messagefile and changesfile can be empty. In this case, the program should just print the forwarding table. The output file will have the general layout as follows: ——– At this point, 1st change is applied ——– At this point, 2nd change is applied ——– And so on.. 6 Notes All the notes for the previous MPs still apply. We are not repeating those here for brevity. New information: 1. Your project must include a Makefile whose default target makes executables called distvecand linkstate. 2. Command line format: “./distvec topofile messagefile changesfile”, “./linkstate topofile messagefile changesfile”. Do refer to MP0, MP1, and MP2 instructions for other notes.

$25.00 View

[SOLVED] Ece438 – machine problem 2

Abstract This machine problem tests your understanding of reliable packet transfer. You will use UDP to implement your own version of TCP. Your implementation must be able to tolerate packet drops, allow other concurrent connections a fair chance, and must not be overly nice to other connections (should not give up the entire bandwidth to other connections). 1 Introduction In this MP, you will implement a transport protocol with properties equivalent to TCP. You have been provided with a file called sender main.c, which declares the function void reliablyTransfer(char* hostname, unsigned short int hostUDPport, char* filename, unsigned long long int bytesToTransfer). This function should transfer the first bytesToTransfer bytes of filename to the receiver at hostname: hostUDPport correctly and efficiently, even if the network drops or reorders some of your packets. You also have receiver main.c, which declares void reliablyReceive(unsigned short int myUDPport, char* destinationFile). This function is reliablyTransfer’s counterpart, and should write what it receives to a file called destinationFile. 2 What is expected in this MP? Your job is to implement reliablyTransfer()and reliablyReceive() functions, with the following requirements: • The data written to disk by the receiver must be exactly what the sender was given. • Two instances of your protocol competing with each other must converge to roughly fairly sharing the link (same throughputs ±10%), within 100 RTTs. The two instances might not be started at the exact same time. • Your protocol must be somewhat TCP friendly: an instance of TCP competing with you must get on average at least half as much throughput as your flow. • An instance of your protocol competing with TCP must get on average at least half as much throughput as the TCP flow. (Your protocol must not be overly nice.) • All of the above should hold in the presence of any amount of dropped packets. All flows, including the TCP flows, will see the same rate of drops. The network will not introduce bit errors. • Your protocol must, in steady state (averaged over 10 seconds), utilize at least 70% of bandwidth when there is no competing traffic, and packets are not artificially dropped or reordered. • You cannot use TCP in any way. Use SOCK DGRAM (UDP), not SOCK STREAM. The test environment has a 20Mbps connection, and a 20ms RTT. 3 VM Setup You’ll need 2 VMs to test your client and server together. Unfortunately, VirtualBox’s default setup does not allow its VMs to talk to the host or each other. There is a simple fix, but then that prevents them from talking to the internet. So, be sure you have done all of your apt-get installs before doing the following! (To be sure, just run: sudo apt-get install gcc make gdb valgrind iperf tcpdump ) Make sure the VMs are fully shut down. Go to each of their Settings menus, and go to the Network section. Switch the Adapter Type from NAT to “host-only”, and click ok. When you start them, you should be able to ssh to them from the host, and it should be able to ping the other VM. You can use ifconfig to find out the VMs’ IP addresses. If they both get the same address, sudo ifconfig eth0 newipaddr will change it. (If you make the 2nd VM by cloning the first + choosing reinitialize MAC address, that should give different addresses.) New in MP2: You can use the same basic test environment described above. However, the network performance will be ridiculously good (same goes for testing on localhost), so you’ll need to limit it. The autograder uses tc . If your network interface inside the VM is eth0, then run (from inside the VM) the following command: sudo tc qdisc del dev eth0 root 2>/dev/null to delete existing tc rules. Then use, sudo tc qdisc add dev eth0 root handle 1:0 netem delay 20ms loss 5% followed by sudo tc qdisc add dev eth0 parent 1:1 handle 10: tbf rate 20Mbit burst 10mb latency 1ms will give you a 20Mbit, 20ms RTT link where every packet sent has a 5% chance to get dropped. Simply omit the loss n% part to get a channel without artificial drops. (You can run these commands just on the sender; running them on the receiver as well won’t make much of a difference, although you’ll get a 20ms RTT if you don’t adjust the delay to account for the fact that it gets applied twice.) 4 Autograder and submission We use the autograder to grade your MPs, the submission process is simple. First, open the autograder web page: http://10.105.100.204. This is a ZJUI-private IP. If your device is not accessing it through the campus network, please use VPN to get a private IP You will see two sections. MP Submission allows you to submit your assignment. You can do this by entering your Student ID (start with 320), selecting which MP you are submitting, selecting the file extension of your files and uploading your MP files. Note: only C/Cpp files are accepted. When uploading files, add your files one by one, do not choose multiple files to add at one time. Submission History section allows you to check your submission history and grade. You can do this by entering your student ID. Caution: The queue can only handle 200 submissions at one time, so remember to check your submission status in Submission History after you submit your assignment. During the hours leading up to the submission, the queue could be long. So it is advisable to get your work done early. 5 Grade Breakdown 10%: You submitted your assignment correctly and it compiles correctly on the autograder 20%: Your receiver and sender can transfer small files correctly 20%: Your sender and receiver can transfer big files correctly 10%: Your sender and receiver can utilize empty link 20%: Your sender and receiver can transfer files correctly with package dropping 10%: Your sender and receiver are TCP friendly with no loss 10%: Your sender and receiver are TCP friendly with 1% loss (We will use diff to compare the output file with the downloaded copy, and you should do the same. If diff produces any output, you aren’t transferring the file correctly.) 6 Test Details Your MP2 code will run entirely within Docker Containers and will be tested via Docker’s private network. The testing for MP2 is divided into 7 stages: 1. Compiling: We will compile the code for your sender and receiver. If the compilation is successful, you will receive the basic score for successful compilation; if it fails, you will receive a score of 0, and subsequent tests will not be conducted. 2. Small file transfer test in a no-loss environment: We will conduct a small file transfer test of your sender and receiver in a Docker network without any artificially induced packet loss. First, we’ll launch your receiver code in one container and then your sender code in another container. The time limit for this task is 5 seconds. If your code can complete the transfer of the small file within 5 seconds and the output file successfully matches the source file when compared using diff, you will receive the score for this phase. If this phase fails, no further tests will be conducted. 3. Large file transfer test in a no-loss environment: We will test your code using a large file of 18.4MB. The testing method is the same as for the small file, but you have a time limit of 10 seconds. If the transfer is completed within 10 seconds and passes the diff comparison, you earn this phase’s score. If this phase fails, subsequent tests will proceed unaffected. 4. Channel bandwidth utilization test: We will transfer a file in a no-loss environment and place a stricter time constraint on its transfer than the previous tests to ensure your code utilizes the channel bandwidth to its maximum potential. If the file transfer completes within the stipulated time and passes the diff comparison, you earn this phase’s score. If this phase fails, subsequent tests will proceed unaffected. 5. File transfer test with 5% packet loss and 20ms delay: We will set the Docker network to have a 5% packet loss and a 20ms delay and test whether your code can accurately receive the file. We’ll use a file of several KBs for the test. If the file is transferred correctly within 10 seconds and passes the diff comparison, you earn this phase’s score. If this phase fails, subsequent tests will proceed unaffected. 6. TCP-friendly test: This test will be conducted within a 300Mbps channel. Using iperf3, we will create a TCP stream and measure its baseline rate when it’s the only stream. Then, we’ll run your code and iperf3 concurrently and test the rate the TCP stream can achieve when sharing the channel. If the TCP stream manages to get more than 40% of its baseline rate, the test is passed. If this phase fails, no further tests will be conducted. 7. TCP-friendly test with 1% loss: We will test within a 300Mbps channel with a 1% packet loss. Similar to the previous test, we will first record the baseline rate of only the TCP stream. We’ll then test the rate the TCP stream can achieve when your code and the TCP stream share the channel. If the TCP stream gets more than 40% of its baseline rate, the test is passed.

$25.00 View

[SOLVED] Ece438 – machine problem 1

Abstract This machine problem introduces you to a bare-bones HTTP client that can get data from any web server. This is the kind of code that is running in your browser. You will also create a HTTP server that can serve data to other clients much like how a real server would function. 1 Introduction In this assignment, you will implement a simple HTTP client and server. The client will be able to GET correctly from standard web servers, and browsers will be able to GET correctly from your server. The test setup will be two VMs, one server and one client. Each test will use your client or wget, and your server or thttpd. Your client doesn’t have to support caching or recursively retrieving embedded objects. HTTP uses TCP – you can use Beej’s client.c and server.c as a base. Your server must support concurrent connections: if one client is downloading a 10MB object, another client that comes looking for a 10KB object shouldn’t have to wait for the first to finish. 2 What is expected in this MP? 2.1 HTTP Client Your client should run as ./http client http://hostname[:port]/path/to/file e.g. ./http client http://127.0.0.1/index.html ./http client http://illinois.edu/index.html ./http client http://12.34.56.78:8888/somefile.txt ./http client http://localhost:5678/somedir/anotherfile.html If there is no :port, assume port 80 – the standard HTTP port. You should write the file that you receive to a file called “output” (no file extension, like txt or html). Here’s the very simple HTTP GET that wget uses: GET /test.txt HTTP/1.1 User-Agent: Wget/1.12 (linux-gnu) Host: localhost:3490 Connection: Keep-Alive The GET /test.txt instructs the server to return the file called test.txt in the server’s toplevel web directory. User-Agent identifies the type of client. Host is the URL that the client was originally told to get from – exactly what the user typed. This is useful in case a single server has multiple domain names resolving to it (maybe www.cs.illinois.edu and www.math.illinois.edu), and each domain name actually refers to different content. This could be a bare IP address, if that’s what the user had typed. The 3490 is the port – this server was listening on 3490, so I called “wget localhost:3490/test.txt”. Finally, Connection: Keep-Alive refers to TCP connection reuse, which will be discussed in class. Note that the newlines are technically supposed to be CRLF – so, “ ” on a Unix machine. Only the first line is essential for a server to know what file to give back, so your HTTP GETs can be just that first line. HTTP specifies that the end of a request should be marked by a blank line — so be sure to have two newlines at the end. (This demarcation is necessary because TCP presents you with a stream of bytes, rather than packets.) 2.2 HTTP Server Now for the HTTP response. Here’s what Google returns for a simple GET of /index.html:sudo ./http server 80 ./http server 8888 (The sudo is there because using any port

$25.00 View

[SOLVED] Ece438 – machine problem 0

Abstract 1 Introduction The purpose of this machine problem is to familiarize you with network programming in the environment to be used in the class and to acquaint you with the procedure for handing in machine problems. The problem is intended as an introductory exercise to test your background in C programming. You will obtain, compile, run, and extend a simple network program. The extensions to the code will introduce you to one method of framing data into individual messages when using a byte stream abstraction such as TCP for communication. 2 What is expected in the MP? Inside the release folder. You will find a folder named mp0. which contains the programs client.c, server.c, talker.c, and listener.c – all from Beej’s Guide to Network Programming: (http://beej.us/guide/bgnet/) Beej’s guide is an excellent introduction to socket programming, and very approachable. Compile the files using gcc to create the executable files client, server, talker, and listener . We provide a Makefile that will compile all 4 (simply run make inside the directory). The real assignments will require you to submit a Makefile. so if you aren’t already experienced with make. please familiarize yourself with the provided Makefile, and ensure that you can adapt it to a new project. Login to two different machines (Virtual Machines), and execute client on one and server on the other. This makes a TCP connection. Next,execute talker on one machine and listener on the other. This sends a UDP packet. Note that the connection oriented pair, server and client, use a different port than the datagram oriented pair, listener and talker. Try using the same port for each pair, and run the pairs simultaneously. Do the pairs of programs interfere with each other? Next, change server .c to accept a file name as a command line argument and to deliver the length and contents of the file to each client. Assume that the file contains no more than 100 bytes of data. Send the length of the file (an integer between 0 and 100) as an 8-bit integer. Change client .c to read first the length, then that number of bytes from the TCP socket, and then print what was received.The client output should look like this: client:connecting to client:received bytes This is a sample file that is sent over a TCP connection. where is the address of the server, is the number of bytes received, and the rest of the output is the file contents. That’s it. Sounds simple, doesn’t it? Indeed, for experienced Unix/C programmers, this MP is trivial. Others should find it a nice way to get started on network programming. You will need to have (or quickly acquire) a good knowledge of the ANSI C programming language including the use of pointers, structures, typedef, and header files. Don’t simply download the source code and compile the programs, but make sure that you read and understand how the sockets are created and the connection established. Beej’s guide is a very useful tool in this sense. 3 How to Set Up VirtualBox VM Environment? The autograder runs your code in VMs-64-bit Ubuntu 22.04.1 LTS VMs (desktop version), running onVirtualBox. Therefore, to test your code, you will need a 64-bit Ubuntu 22.04.1 LTS VM of your own.(Even if you’re already running Ubuntu 22.04.1 LTS on your personal machine, later assignments will use multiple VMs, so you might as well start using the VM now.) This tutorial is for Windows, but VirtualBox works and looks the same on all OSesNote: If your machine is on ARM-architecture (e.g.. M Macbook), you might want to use other virtual machine solution like Docker, since VirtualBox does not support ARM architecture. Please contact usif you had difficulties setting up another virtual environment. The Ubuntu 22.04.1 image: https://ubuntu.com/download After the Ubuntu install process (within the VM), you should install the ssh server. You can do sudo apt-get install openssh-server once the OS is installed. Use apt-get (sudo apt-get installxyz) to install any programs you’ll need, like gcc, make, gdb, valgrind. I would suggest also getting iperf and tcpdump,which will be useful later. Note: WSL (Windows-Subsystem for Linux) might work as one of your machine, but keep in mind that some compiler actions might be different. If you are using WSL, please be very careful about the undefined coding behaviors, e.g., variable not initialized, memory leakage, etc. 4 How to Set Up Networking Inside VMs? VirtualBox’s default network setup is a NAT (which we’ll learn about later!) interface to the outside world, provided by the host computer. This allows the VM to access the Internet, but the host computer and other VMs will not be able to talk to it. We’re going to replace the NAT interface with one that allows those communications. However, BEFORE YOU MAKE THIS CHANGE, you should use that Internet access to: sudo apt-get install gcc g++ make gdb iperf tcpdump wget 5 What Tips and Tricks Will be Useful? MP0 is ungraded, but still very important to get you started. Performing this assignment successfully will make submitting the subsequent assignments much easier. 6 Assignment Submission (MP0 does not need this step) We use the autograder to grade your MPs, the submission process is simple. First, open the autograder web page: http://10.105.100.204. This is a ZJUI-private IP. If your device is not accessing it through the campus network, please use VPN to get a private IP You will see two sections. MP Submission allows you to submit your assignment. You can do this by entering your Student ID (start with 320), selecting which MP you are submitting, selecting the file extension of your files and uploading your MP files. Note: only C/Cpp files are accepted. When uploading files, add your files one by one, do not choose multiple files to add at one time. Submission History section allows you to check your submission history and grade. You can do this by entering your student ID. Caution: The queue can only handle 200 submissions at one time, so remember to check your submission status in Submission History after you submit your assignment. During the hours leading up to the submission, the queue could be long. So it is advisable to get your work done early.

$25.00 View

[SOLVED] Ece438 – homework 4

• This assignment has a total of 100 points. • Please write your answer in the white space to the right of the corresponding problem. 1 T/F (no need for justification) – 2 x 5 points 1. Packets flowing through virtual circuit carry destination host address. 2. In Virtual circuits, every router on source-dest path maintains ”state” for each passingconnection. 3. Consider the network layer. Routers contain state about end-to-end connections. 5. Suppose that a packet is flowing from your laptop to a server. When the packet flowsthrough routers, say R4 to R5, the packet header contains the IP address of R5. 2 Forwarding table – 2 × 4 points Suppose that IP addresses have 4 bits only, from 0000 to 1111. Of these 16 addresses, the first 5 should be forwarded to interface 1; the next 4 ones be forwarded to interface 2; the 2 addresses after then should be forwarded to interface 3, and the last 5 ones to interface 4. Create the most optimal (7 row) forwarding table with 2column forwarding table (column 1 = Prefix, and column 2 = interface number) that the router should use. Choose all apply to answer the following questions. 1. What will the entry/entries in the prefix column be for Interface 1? (a) 00 (b) 01 (c) 000 (d) 001 (e) 010 (f) 0100 2. What will the entry/entries in the prefix column be for Interface 2? (a) 01 (b) 011 (c) 100 (d) 001 (e) 0101 (f) 1000 3. What will the entry/entries in the prefix column be for Interface 3? (a) 1 (b) 10 (c) 100 (d) 101 (e) 1001 (f) 1010 4. What will the entry/entries in the prefix column be for Interface 4? (a) 10 (b) 11 (c) 101 (d) 110 (e) 111 (f) 1011 3 IP addressing – 3 x 3 points For every IP in the sub-questions, please select the correct forwarding rule according to the routing table as shown in the choices: 1. 192.168.1.2 (a) 192.168.0.0/17 to port A (b) 192.168.0.0/23 to port B (c) 192.168.2.0/24 to port C (d) 10.0.0.0/0 to port D 2. 192.168.32.2 (a) 192.168.0.0/17 to port A (b) 192.168.0.0/23 to port B (c) 192.168.2.0/24 to port C (d) 10.0.0.0/0 to port D 3. 192.168.255.5 (a) 192.168.0.0/17 to port A (b) 192.168.0.0/23 to port B (c) 192.168.2.0/24 to port C (d) 10.0.0.0/0 to port D 4 Subnet – 6 points Consider a router that interconnects three subnets: Subnet1, Subnet2, Subnet3. Suppose all of the interfaces in each of these three subnets are required to have the prefix 203.1.17/24. Also suppose that Subnet 1 required to support at least 60 interfaces, Subnet 2 is to support at least 90 interfaces, Subnet3 is to support at least 12 interfaces. Provide three network addresses (of the form a.b.c.d/x) that satisfy these constraints. 5 Dijkstra’s Algorithm – 5 + 3 points Consider the network topology as shown in figure2. Sort the following protocols by the amount of state each node maintains, and give clearexplanation. • Link State • Distance Vector 6 Distance Vector Routing – 3 + 6 + 5 points Poison reverse. The idea is simple. Suppose three routers x, y, z. If z routes through y to get to destination x, then z will advertise to y that its distance to x is infinity; that is, z will advertise to y that Dz(x)= infinity (even though z knows Dz(x) in truth). z will continue telling this little white lie to y as long as it routes to x via y. Since y believes that z has no path to x, y will never attempt to route to x via z, as long as z continues to route to x via y (and lies about doing so). (Notice for a more detailed explanation, see the textbook in Distance-Vector Algorithm) Assume four routers x, y, z, and w are connected as follows, and the cost of each link is given in the picture. Suppose that poison reverse is used in the distance vector routing algorithm.1. Why is poison reverse needed? 2. When distance vector routing has stabilized (by starting with the initial costs specifiedabove), routers w,y, and z communicate to each other their distance vectors to router x (i.e., Dw(x),Dy(x),Dz(x) ). What are the values of these distance vectors? Fill in the following questions. ”Da(b) to c” denotes the value of router a’s distance vector to router b, which is sent to router c. If the answer is infinity, please write “inf”. (a) Dy(x) to z (b) Dy(x) to w (c) Dz(x) to y (d) Dz(x) to w (e) Dw(x) to y (f) Dw(x) to z 3. Now suppose that the link cost between x and y increase to 60. Will there be a count-to-infinity problem even if poisoned reverse is used? Why or why not? If there is a count-to-infinity problem, then how many iterations are needed for the distance-vector routing to reach a stable state again? Justify your answer. 7 AS Routing – 2 x 2 + 3 x 2 points Consider the network shown above. Suppose AS3 and AS2 are running OSPF for their intra-AS routing protocol. Suppose AS1 and AS4 are running RIP for their intra-AS routing protocol. Suppose eBGP and iBGP are used for the inter-AS routing protocol. Suppose there is no physical link between AS2 and AS4.1. Router 3c learns about prefix x from which routing protocol? (a) OSPF (b) RIP (c) eBGP (d) iBGP 2. Router 3a learns about x from which routing protocol? (a) OSPF (b) RIP (c) eBGP (d) iBGP 3. Once router 1d learns about x it will put an entry (x, I) in its forwarding table.Now suppose there is a physical link between AS2 and AS4, shown by the dotted line. Suppose router 1d learns that x is accessible via AS2 as well as via AS3. Will I be set to I1 or I2? Explain why. 4. Now suppose there is another AS, called AS5, which lies on the path between AS2 andAS4 (not shown in diagram). Suppose router 1d learns that x is accessible via AS2 AS5 AS4 as well as AS3 AS4. Will I be set to I1 or I2? Explain why. 8 Switch – 5+5+2+2 points A Slotted ALOHA network of N = 32 nodes gets separated into 4 smaller networks using a switch. Each smaller network now contains N/4 nodes. 1. Before the switch was installed, calculate the probability of collisions in the network.Assume that each node attempts transmission in a given slot with a probability p = 0.3. Your answer should be correct up to 4 decimal places. Please also write the equation in terms of N and p. 2. After the switch was installed, assume that sender-receiver pairs are always within asmaller network (i.e., traffic does not cross the switch). Calculate the probability of collisions in the whole network, i.e, probability that collision occurs in any of the three smaller networks. Your answer should be correct up to 4 decimal places. Please also write the equation in terms of N and p. 3. Explain the advantages of the switch in terms of collision probability and overall networkthroughput. 4. In this scenario, would it make any difference if the switch was replaced by a hub? 9 Wireless – 3 + 3 + 6 points In the diagram below, each wireless node is shown along with its transmission radius. E.g., A’s transmission radius is the circle with the dashed line.1.List all the hidden terminals in the above wireless network? 2. The network uses CSMA/CA. When B wants to transmit to A it sends an RTS and Areplies with a CTS to reserve the channel. Is this guaranteed to avoid collisions, explain why or why not? 3. Suppose the nodes F, A, B, C, D are equally spaced by a distance of d. Assume all nodes are identical and transmit at same power level on the same frequency. Also assume the signal attenuates based on free space pathloss model. A is transmitting to B while C is transmitting to D. Compute the SINR of C’s signal at D in the following cases? (a) The noise power at D is zero. (b) The noise power at D is not zero and in the absence of any interference, the SNR ofC’s signal at D is 20. 10 Short Answer Questions – 3 x 3 1. Alice and Bob wanted to share files with each other by setting up a socket connection.Both of them typed ifconfig on their Linux machine to obtain their IP addresses. Alice’s IP address is 130.126.255.2. Bob’s IP address is 192.168.34.102. Assume they both have access to the Internet. Can they set up the socket connection without requiring other external servers? Explain your answer 2. When they both log on to https://www.iplocation.net/ to check their IP address, willthey observe same IP address as running ifconfig? 3. Now Carol came in. She typed ifconfig and saw her IP address is 192.168.34.103. Canshe communicate directly with Bob without an external server?

$25.00 View

[SOLVED] Ece438 – homework 3

• This assignment has a total of 100 points. • Please write your answer in the white space to the right of the corresponding problem. 1 Choose all that Apply – 3 × 3 points 1. A TCP socket is an end to end connection between two . (a) processes (b) threads (c) hosts (d) devices 2. Suppose Host A sends one segment with sequence number 40 and 8 bytes of data over a TCP connection to Host B. In this same segment the acknowledgement number is necessarily 48. (a) True (b) False 3. Consider Selective ACK protocol. Choose all that apply (a) Receiver’s base sequence number can be smaller than transmitter’s base sequence number (b) Receiver’s base sequence number can be smaller than transmitter’s tail sequence number (c) Receiver’s base sequence number can be larger than transmitter’s tail sequence number. (d) Receiver’s base sequence number can be larger than transmitter’s tail sequence number plus one. 2 TCP examples 1 – 2 × 3 + 3 points Assume TCP is at slow start phase from CW = 11. CW at time t1 = 2. CW at time t2 = 3. CW at time t3 = 4. How should the TCP transmitter react after receiving A3? Please give CW head, CW tail, and Send NOTE: For questions that ask how TCP reacts, the following fields are defined as: • CW head: Congestion Window Head (also called Base); An integer • CW tail: Congestion Window Tail; An integer • SSthresh: Slow Start Threshold; Numerical answers round to 1 decimal place. • Send: The packets that need to be transmitted by the TCP transmitter. A sequence of numbers. When the transmitter has no packets to send, write []. Example: If CW=[4,5,6,7,8], then you should answer CW head as 4 and CW tail as 8. 5. How should the TCP transmitter react after receiving A2? Please give CW head, CW tail, and Send 3 TCP example 2 – 2 x 3 + 4 x 3 points Assume TCP is at slow start phase from CW = 11. CW at time t1 = 2. CW at time t2 = 3. CW at time t3 = 4. How should the TCP transmitter react after receiving packet P3’s timeout? Please giveCW head, CW tail, SSthresh and Send 5. How should the TCP transmitter react after receiving the penultimate ACK shown ingraph? Please give CW head, CW tail, SSthresh and Send 6. How should the TCP transmitter react after receiving the last shown ACK? Please give CW head, CW tail, SSthresh and Send 4 TCP example 3 – 2 x 2 + 3 points Assume packets before P10 have already been acknowledged in the past and TCP is in slow start.1. CW at time t1 = 2. CW at time t2 = 3. How should the TCP transmitter react after receiving A13? Please give CW head, CW tail, and Send 5 TCP example 4 – 2 x 2 + 3 + 4 points Assume that the first ACK that is shown to arrive at the TCP transmitter is A5 and TCP is in slow start. Also assume packets before P5 have already been acknowledged in the past.1. CW at time t1 = 2. CW at time t2 = 3. How should the TCP transmitter react after receiving A5? Please give CW head, CW tail, and Send 4. How should the TCP transmitter react after timeout? Please give CW head, CW tail, SSthresh and Send 6 TCP example 5 – 4 × 6 points Assume packets before P10 have already been acknowledged in the past1. What should the values of CW be at times t1, t2, t3, and t4? (Round to 1 decimal place) 2. How should the TCP transmitter react upon receiving A10? Please give CW head, CW tail, SSthresh, and Send 3. How should the TCP transmitter react upon receiving A11? Please give CW head, CW tail, SSthresh, and Send 4. How should the TCP transmitter react upon receiving A13? Please give CW head, CW tail, SSthresh, and Send 5. How should the TCP transmitter react upon receiving A12? Please give CW head, CW tail, SSthresh, and Send 6. How should the TCP transmitter react upon receiving A14? Please give CW head, CW tail, SSthresh, and Send 7 T/F Question – 5 + 5 points Answer true or false of the following questions and briefly justify your answer: 1. With SR protocal, it is possible for the sender to receive an ACK for a packet that fallsoutside of its current window. 2. With GBN, it is possible for the sender to receive an ACK for a packet that falls outsideof its current window. 8 GBN Question – 6 x 2 points 1. What are the possible set of sequence number inside the sender’s window at time t?Justify your answer. 2. What are all possible values of ACK field in all possible message currently propagtingback to the sender at time t? Justify your answer.

$25.00 View

[SOLVED] Ece438 – homework 2

• This assignment has a total of 100 points. • Please write your answer in the white space to the right of the corresponding problem. 1 Choose all that Apply – 3 × 4 points 1. Two distinct Web pages (for example, www.intl.zju.edu.cn/students.html and www.intl.zju.edu.cn/research.html) can be sent over the same persistent connection. (a) True (b) False 2. Is it possible for an organization’s Web server and mail server to have exactly the samealias for the hostname(for example, foo.com?) (a) Yes (b) No 3. Knowing the alias for the mail server, What type of RR should a DNS client to queryto get the cannonical name for the mail server? (a) A (b) NS (c) CNAME (d) MX 4. What protocol might be used if a user want to get email from user’s mail server to hislocal PC? (a) SMTP (b) POP3 (c) IMAP (d) HTTP 2 Short Answer Questions – 5 × 2 points 1. Briefly explain the advantages and disadvantages of the use of cookie. 2. Describe how Web caching can reduce the delay in receiving a requested object. WillWeb caching reduce the delay for all objects requested by a user or for only some of the objects? Explain why. 3 Web Caching – 7 x 3 points Assume a group of students in an institution want to access a private server A outside of the institution. The bottleneck link from the institution to this server supports a bitrate of 2MB/S. Assume the average request rate from the institution is 80 times/s and each request is 0.02MB. Assuming there is no other traffic within or outside of the institution, answer the following questions. Assume that queueing delay dominates so you can neglect the much smaller propagation delays, transmit times, and processing delays 1. What is the average access time for a user in the institution to access this server? Assume the queuing delay is 1/(1-L) milliseconds, where L is the fraction of link usage. (Your answer should be in milliseconds). 2. To improve network performance, we now increase the bitrate of this bottleneck linkto 6MB/s. Calculate the average access time again. Your unit should be milliseconds and must be computed up to 2 decimal places. 3. Another way to improve network performance is to add a cache server within the institution without increasing the bandwidth of bottleneck link. The bitrate to the cache server is 10MB/s. Assume there is a 60% cache hit rate. The queuing delay for both cache server and server A follows the formula in Q1. Calculate the average access time in this case. (Assume the network knows cache server so no additional delays are needed to find that cache server; also, your unit should be milliseconds, computed to 2 decimal places). 4 Traceroute – 4 × 3 points In the next 2 figures, you will see a series of results from running traceroute (with the -q 1 option to send one probe per hop). For each of the results, please answer the following questions:1. Which hop(s) (if any) is transoceanic 2. Based on the RTT to the last hop, what’s the furthest away the corresponding servercould possibly be located? (Note: use speed of packet propagation: (2 × 108) m/s.) 3. Sometimes the RTT of a subsequent hop is lower than the RTT of a previous one. Giveone reason for this. 5 HTTP – 7 × 3 points Suppose a webpage has nothing but 10 large images each of size 10 MB. A client wants to access the webpage and load the images in his browser. The RTT between the client and the server is 40 ms and the transmission rate at the server is 500 MB/s. How long will it take to load the webpage in each of the following cases? (Note: the size of the object for indexing is negligible.) For all answers, please answer in milliseconds, and include the detail 1. Using Non-Persistent HTTP? 2. Using Persistent HTTP? 3. Using Pipelined Persistent HTTP? 6 Client-Server – 7 × 2 points Think about spreading an F-bit file among N peers using a client-server structure. Let the server have a maximum upload capacity µs, and each client c has a download capacity dc. Assume that the server can serve multiple clients simultaneously and fluidly set the rate for each client rc. 1. Suppose that µs/N ≤ dmin, where dmin = mincdc be the minimum download rate. How would you set the rates rc for each client so that the file is fully distributed to all clients in a minimum time? (i.e., you are minimizing the time that the slowest client receives the file.) What would the distribution time be? 2. Suppose now that µs/N>dmin. How would you set the rates rc now to fully distribute the file to the clients in a minimum time? And what would this time be? 7 DNS – 7 + 3 points The task requires using the dig command to provide answers. To ensure accurate results, it is recommended to perform these steps from a computer located on a campus network. The user can refer to the dig documentation to understand how to utilize it. 1. Starting from one of the root servers a–m.root-servers.net, perform an iterative lookup for the host www.eecs.mit.edu to get the ip address. For instance, you can initiate the search by using the following command: dig @h.root-servers.net www.eecs.mit.edu (1) the domain name of the name server being visited (2) the IP address of the name server that is currently being used (3) For how long can you store the results in cache. 2. Can you explain why the DNS protocol tends to utilize UDP rather than TCP?

$25.00 View

[SOLVED] Ece 385 experiment 6 test programs

For each test program, the point value and starting address (in hex) is given, followed by a list of instructions used in the test program, and a description of what the program does. To begin each test, input the starting address on the switches and press “Run”. There are seven tests; the first six form a hierarchical group, worth a total of 6 points, in which each test builds upon the instructions tested in the previous. Therefore, if the second test is demonstrated without demonstrating the first test, the points for both the first and second tests will be awarded; demonstrating the last test first will yield all 6 points. The seventh test is used to verify that the CPU properly acts upon each press of the Run and Continue buttons exactly once. This functionality comprises the final 1 point of the demo. In the descriptions of the test programs, the term “Checkpoint” is used to refer to a pause instruction with a specified value. “Checkpoint 1” would be a pause instruction that displays x01 on the LEDs (excluding any I/O flags). Basic I/O Test 1 Points: 1 Program Start: x0003 Instructions Tested: ANDi, LDR, STR, BRThis program uses ANDi to clear R0 (i.e., ANDi R0, R0, 0), which is used as a base register for memory operations (using negative values for the offset). All test programs do this to set up for I/O operations. The program then reads in the data on the switches, writes this value to the hex display, and loops back (using BRnzp as an unconditional jump) to repeat the process indefinitely. When working correctly, the hex displays should appear to always display the value of the switches. Basic I/O Test 2 Points: 1 (cumulative total: 2.0) Program Start: x0006 Instructions Tested: ANDi, LDR, STR, BR, PSEThe code for this program is identical to the previous test, except that it uses pause instructions to ask for input and report output. The first pause instruction (checkpoint 1) will ask for input on the switches. The second and subsequent pauses will display x02 and both ask for input and report that an output is present on the hex display. When operating correctly, each press of the Continue button will transfer the value from the switches to the hex display, but the hex display will not change until Continue is pressed.Self-Modifying Code Test Points: 1 (cumulative total: 3.0) Program Start: x000B Instructions Tested: ANDi, ADDi, LDR, STR, BR, JSR, PSEThis program is based upon the same loop as the last program, but inserts some additional operations. Before the loop begins, JSR 0 is executed. This serves to put the PC address in R7 without actually changing the value of PC. (This usage is common in the remaining tests as well). This PC value is used to load the data for the second pause instruction into a register. The loop then operates normally, except that in each iteration, the data for the pause instruction is incremented and stored back to the proper memory location. The result is that with each iteration of the loop, the pause instruction will display a value on the LEDs one greater than the value in the last iteration. XOR Test Points: 1 (cumulative total: 4.0) Program Start: x0014 Instructions Tested: AND, ANDi, NOT, LDR, BR, PSEThis program performs XOR operation. In sLC-3, there is no dedicated XOR instruction, so the XOR is performed by multiple simple instructions (AND and NOT). The program will ask for input values (checkpoints 1 and 2), XOR them, and display the result on the hex display (checkpoint 3). It will then loop back to the top and repeat the process from checkpoint 1.Multiplication Test (or, “Lab 5 in Software”) Points: 1 (cumulative total:5.0) Program Start: x0031 Instructions Tested: AND, ANDi, ADD, ADDi. NOT, LDR, STR, BR, JSR, PSEThis program performs multiplication, using a variation on the shift-and-add algorithm (using ADD Rx, Rx, Rx as a left-shift operation). The program will ask for input values (checkpoints 1 and 2), multiply them, and display the result on the hex display (checkpoint 3). It will then loop back to the top and repeat the process from checkpoint 1. Sort Test Points: 1 (cumulative total: 6.0) Program Start: x005A Instructions Tested: This program is organized into four parts. The first part is a menu, containing function calls to the other three parts that are executed based on input. The menu contains a single pause instruction, checkpoint –1 (displays xFF on the LEDs). Entering x0001 will call the “data entry” function, entering x0002 will call the “sort” function, and entering x0003 will call the “display” function. Any other value will simply cause the menu to loop back to the start without doing anything. The data entry function allows the user to enter new data into the list to be sorted. The list is fixed at length 16, no more, no less. The function will display the current index (starting at 0) to be written to the list, and ask for data from the switches (checkpoint 1). It will do this 16 times, displaying the current index each time, before returning control to the menu loop. The sort function will sort the values in the list using the Bubble Sort algorithm. No feedback is given to the user about the completion of the algorithm; it will return (seemingly) immediately to the menu.The display function will display, in turn, each member of the list (checkpoint 2). Note that since the hex display will be used to display the data, the index of the value is displayed on the LEDs (LED), using the self-modifying code technique used earlier to change the pause instruction in each iteration. A properly sorted list will be displayed in ascending order. To test this program, the data entry function needs not be used, since sample values are preloaded into the list along with the program itself. These values, both in their original order and sorted order, are in Table 1. It is recommended that to demo, the display function should be run first to verify that the sample values are present and unsorted, the sort function should be run second, and the display function should be run again to verify that the sort was successful. Index Before Sort After Sort 0 x”00ef” x”0001″ 1 x”001b” x”0003″ 2 x”0001″ x”0007″ 3 x”008c” x”000d” 4 x”00db” x”001b” 5 x”00fa” x”001f” 6 x”0047″ x”0046″ 7 x”0046″ x”0047″ 8 x”001f” x”004e” 9 x”000d” x”006b” A x”00b8″ x”008c” B x”0003″ x”00b8″ C x”006b” x”00db” D x”004e” x”00ef” E x”00f8″ x”00f8″ F x”0007″ x”00fa”Table 1: Sample list values before and after sorting “Act Once” Test Points: 1 Program Start: x002A Instructions Tested: ANDi, ADDi, STR, JSR, JMP, PSE

$25.00 View

[SOLVED] Dsci553 foundations and applications of data mining

Assignment 5 1. Overview of the Assignment In this assignment, you are going to implement three streaming algorithms. In the first two tasks, you will generate a simulated data stream with the Yelp dataset and implement Bloom Filtering and Flajolet-Martin algorithm. In the third task, you will do some analysis using Fixed Size Sample (Reservoir Sampling). 2. Requirements 2.1 Programming Requirements a. You must use Python and Spark to implement all tasks. There will be 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are not required to use Spark RDD in this assignment. c. You can only use standard Python libraries, which are already installed in the Vocareum. 2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use above library versions to compile and test your codes. You are required to make sure your codes work and run on Vocareum otherwise we won’t be able to grade your code. 2.3 Important things before starting the assignment: 1. If we cannot call myhashs(s) in task1 and task2 in your script to get the hash value list, there will be a 50% penalty. 2. We will simulate your bloom filter in the grading program simultaneously based on your myhashs(s) outputs. There will be no point if the reported output is largely different from our simulation. 3. Please use integer 553 as the random seed for task3, and follow the steps mentioned below to get a random number. If you use the wrong random seed, or discard any obtained random number, or the sequence of random numbers is different from our simulation, there will be a 50% penalty. 2.4 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! 3. Datasets For this assignment, you need to download the users.txt as the input file. You also need a Python blackbox file to generate data from the input file. Both users.txt and blackbox.py can be found in the publicdata directory on Vocareum. We use the blackbox as a simulation of a data stream. The blackbox will return a list of user ids from file users.txt every time we call it. Although it is very unlikely that the user ids returned from the blackbox are not unique, you are required to handle it wherever required. Please call the blackbox function like the example in the following figure:If you need to ask the blackbox multiple times, you can do it by the following sample code:4. Tasks 4.1 Task1: Bloom Filtering (2.5 pts) In this task, you should keep a global filter bit array and the length is 69997. The hash functions used in a Bloom filter should be independent and uniformly distributed. Some possible the hash functions are: f(x)= (ax + b) % m or f(x) = ((ax + b) % p) % m where p is any prime number and m is the length of the filter bit array. You can use any combination for the parameters (a, b, p). The hash functions should keep the same once you created them. As the user_id is a string, you need to convert the type of user_id to an integer and then apply hash functions to it. The following codes show one possible solution to converting user_id string to integer: import binascii int(binascii.hexlify(s.encode(‘utf8’)),16) (We only treat the exact same strings as the same users. You do not need to consider aliases.) Execution Details To calculate the false positive rate (FPR), you need to maintain a previous user set. The size of a single data stream will be 100 (stream_size). And we will test your code for more than 30 times (num_of_asks), and your FPRs are only allowed to be larger than 0.5 at most once. The run time should be within 100s for 30 data streams. Output Results You need to save your results in a CSV file with the header “Time,FPR”. Each line stores the index of the data batch (starting from 0) and the false positive rate for that batch of data. You do not need to round your answer.You also need to encapsulate your hash functions into a function called myhashs. The input of myhashs function is a user_id (string) and the output is a list of hash values. For example, if you have three hash functions, the size of the output list should be three and each element in the list corresponds to an output value of your hash function. Figure below is a template of myhashs function:Our grading program will also import your python script, call myhashs function to test the performance of your hash functions, and track your implementation. 4.2 Task2: Flajolet-Martin algorithm (2.5 pts)Execution Details For this task, the size of the stream will be 300 (stream_size). And we will test your code more than 30 times (num_of_asks). And for your final result, 0.2

$25.00 View

[SOLVED] Dsci-553 foundations and applications of data mining

Assignment 4 1. Overview of the Assignment In this assignment, you will explore the spark GraphFrames library as well as implement your own Girvan-Newman algorithm using the Spark Framework to detect communities in graphs. You will use the ub_sample_data.csv dataset to find users who have a similar business taste. The goal of this assignment is to help you understand how to use the Girvan-Newman algorithm to detect communities in an efficient way within a distributed environment. 2. Requirements 2.1 Programming Requirements a. You must use Python and Spark to implement all tasks. There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. For task1, you can use the Spark DataFrame and GraphFrames library. For task2 you can ONLY use Spark RDD and standard Python or Scala libraries. 2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no point if we cannot run your code on Vocareum. On Vocareum, you can call `spark-submit` located at /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit`. (*Do not use the one at `/home/local/spark/latest/bin/spark-submit (2.4.4)) 2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! 2.4 What you need to turn in You need to submit the following files on Vocareum: a. [REQUIRED] two Python scripts, named: task1.py, task2.py b1. [OPTIONAL, REQUIRED FOR SCALA] two Scala scripts, named: task1.scala, task2.scala b2. [OPTIONAL, REQUIRED FOR SCALA] one jar package, named: hw4.jar c. [OPTIONAL] You can include other scripts called by your main program. d. You don’t need to include your results. We will grade your code with our testing data (data will be in the same format). 3. Datasets We have generated a sub-dataset, ub_sample_data.csv, from the Yelp review dataset containing user_id and business_id. You can find the data on Vocareum under resource/asnlib/publicdata/. 4. Tasks 4.1 Graph Construction To construct the social network graph, assume that each node is uniquely labeled, and that links are undirected and unweighted. Each node represents a user. There should be an edge between two nodes if the number of common businesses reviewed by two users is greater than or equivalent to the filter threshold. For example, suppose user1 reviewed set{business1, business2, business3} and user2 reviewed set{business2, business3, business4, business5}. If the threshold is 2, there will be an edge between user1 and user2. If the user node has no edge, we will not include that node in the graph. The filter threshold will be given as an input parameter when running your code. 4.2 Task1: Community Detection Based on GraphFrames (2 pts) 4.2.1 Execution Detail The version of the GraphFrames should be 0.6.0. (For your convenience, graphframes0.6.0 is already installed for python on Vocareum. The corresponding jar package can also be found under $ASNLIB/public folder. ) For Python (in local machine): ● [Approach 1] Run “python3.6 -m pip install graphframes” in the terminal to install the package. ● [Approach 2] In PyCharm, you add the sentence below into your code to use the jar package os.environ[“PYSPARK_SUBMIT_ARGS”] = “–packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 pyspark-shell” ● In the terminal, you need to assign the parameter “packages” of the spark-submit: –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 For Scala (in local machine): ● In Intellij IDEA, you need to add library dependencies to your project “graphframes” % “graphframes” % “0.8.2-spark3.1-s_2.12” “org.apache.spark” %% “spark-graphx” % sparkVersion ● In the terminal, you need to assign the parameter “packages” of the spark-submit: –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 For the parameter “maxIter” of the LPA method, you should set it to 5. 4.2.2 Output Result In this task, you need to save your result of communities in a txt file. Each line represents one community and the format is: ‘user_id1’, ‘user_id2’, ‘user_id3’, ‘user_id4’, … Your result should be firstly sorted by the size of communities in the ascending order and then the first user_id in the community in lexicographical order (the user_id is type of string). The user_ids in each community should also be in the lexicographical order. If there is only one node in the community, we still regard it as a valid community.Figure 1: community output file format 4.3 Task2: Community Detection Based on Girvan-Newman algorithm (5 pts) In task2, you will implement your own Girvan-Newman algorithm to detect the communities in the network graph. You can refer to the Chapter 10 from the Mining of Massive Datasets book for the algorithm details. Because your task1 and task2 code will be executed separately, you need to construct the graph again in this task following the rules in section 4.1. For task2, you can ONLY use Spark RDD and standard Python or Scala libraries. Remember to delete your code that imports graphframes. Usage of Spark DataFrame is NOT allowed in this task. 4.3.1 Betweenness Calculation (2 pts) In this part, you will calculate the betweenness of each edge in the original graph you constructed in 4.1. Then you need to save your result in a txt file. The format of each line is (‘user_id1’, ‘user_id2’), betweenness value Your result should be firstly sorted by the betweenness values in the descending order and then the first user_id in the tuple in lexicographical order (the user_id is type of string). The two user_ids in each tuple should also be in lexicographical order. For output, you should use the python built-in round() function to round the betweenness value to five digits after the decimal point. (Rounding is for output only, please do not use the rounded numbers for further calculation) IMPORTANT: Please strictly follow the output format since your code will be graded automatically. We will not regrade because of formatting issues.Figure 2: betweenness output file format 4.3.2 Community Detection (3 pts) You are required to divide the graph into suitable communities, which reaches the global highest modularity. The formula of modularity is shown below:According to the Girvan-Newman algorithm, after removing one edge, you should re-compute the betweenness. The “m” in the formula represents the edge number of the original graph. The “A” in the formula is the adjacent matrix of the original graph. (Hint: In each remove step, “m”, “A”, “k_i” and “k_j” should not be changed). In the step of removing the edges with the highest betweenness, if two or more edges have the same (highest) betweenness, you should remove all those edges. If the community only has one user node, we still regard it as a valid community. You need to save your result in a txt file. The format is the same with the output file from task1. 4.4 Execution Format Execution example: Python: spark-submit –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 task1.py spark-submit task2.py Scala: spark-submit –packages graphframes:graphframes:0.8.2-spark3.1-s_2.12 –-class task1 hw4.jar spark-submit –-class task2 hw4.jar Input parameters: 1. : the filter threshold to generate edges between user nodes. 2. : the path to the input file including path, file name and extension. 3. : the path to the betweenness output file including path, file nameand extension. 4. : the path to the community output file including path, file name andextension. Execution time: The overall runtime limit of your task1 (from reading the input file to finishing writing the community output file) is 400 seconds. The overall runtime limit of your task2 (from reading the input file to finishing writing the community output file) is 400 seconds. If your runtime exceeds the above limit, there will be no point for this task. 5. About Vocareum a. Dataset is under the directory $ASNLIB/publicdata/, jar package is under $ASNLIB/public/ b. You should upload the required files under your workspace: work/, and click submit c. You should test your scripts on both the local machine and the Vocareum terminal before submission. d. During the submission period, the Vocareum will automatically test task1 and task2. e. During the grading period, the Vocareum will use another dataset that has the same format for testing. f. We do not test the Scala implementation during the submission period. g. Vocareum will automatically run both Python and Scala implementations during the grading period. h. Please start your assignment early! You can resubmit any script on Vocareum. We will only grade on your last submission. 6. Grading Criteria (% penalty = % penalty of possible points you get) a. You can use your free 5-day extension separately or together (https://docs.google.com/forms/d/e/1FAIpQLSf6hpYzacaV2d1CJMZfrlE-xl9N6bLkJbhi7aFlAQcObGj0X w/viewform) b. There will be 10% bonus for each task if your Scala implementations are correct. Only when your Python results are correct, the bonus of Scala will be calculated. There is no partial point for Scala. c. There will be no point if your submission cannot be executed on Vocareum. 7. Common problems causing fail submission on Vocareum/FAQ (If your program runs seems successfully on your local machine but fail on Vocareum, please check these) 1. Try your program on Vocareum terminal. Remember to set python version as python3.6,And use the latest Spark /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit 2. Check the input command line format. 3. Check the output format, for example, the header, tag, typo. 4. Check the requirements of sorting the results. 5. Your program scripts should be named as task1.py task2.py. 6. Check whether your local environment fits the assignment description, i.e. version, configuration.

$25.00 View

[SOLVED] Dsci553 foundations and applications of data mining

Assignment 2 1. Overview of the Assignment In this assignment, you will implement the SON Algorithm using the Spark Framework. You will develop a program to find frequent itemsets in two datasets, one simulated dataset and one real-world generated dataset. The goal of this assignment is to apply the algorithms you have learned in class on large datasets more efficiently in a distributed environment. 2. Requirements 2.1 Programming Requirements a. You must use Python to implement all tasks. You can only use standard python libraries (i.e., external libraries like numpy or pandas are not allowed). There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are required to only use Spark RDD in order to understand Spark operations. You will not get any point if you use Spark DataFrame or DataSet. 2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no point if we cannot run your code on Vocareum. On Vocareum, you can call `spark-submit` located at `/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit`. (Do not use the one at /usr/local/bin/spark-submit (2.3.0)). We use `–executor-memory 4G –driver-memory 4G` on Vocareum for grading. 2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! 2.4 What you need to turn in We will grade all submissions on Vocareum and the submissions on the blackboard will be ignored. Vocareum produces a submission report after you click the “Submit” button (It takes a while since Vocareum needs to run your code in order to generate the report). Vocareum will only grade Python scripts during the submission phase and it will grade both Python and Scala during the grading phase. a. Two Python scripts, named: (all lowercase) task1.py, task2.py b. [OPTIONAL] hw2.jar and two Scala scripts, named: (all lowercase) hw2.jar, task1.scala, task2.scala c. You don’t need to include your results or the datasets. We will grade on your code with our testing data (data will be in the same format). 3. Datasets Figure 1 shows the file structure of task1 simulated csv, the first column is user_id and the second column is business_id.Figure 1: Input Data Format 4. Tasks In this assignment, you will implement SON Algorithm to solve all tasks (Task 1 and 2) on top of Spark Framework. You need to find all the possible combinations of the frequent itemsets in any given input file within the required time. You can refer to the Chapter 6 from the Mining of Massive Datasets book and concentrate on section 6.4 – Limited-Pass Algorithms. (Hint: you can choose either A-Priori, MultiHash, or PCY algorithm to process each chunk of the data) 4.1 Task 1: Simulated data (3 pts) There are two CSV files (small1.csv and small2.csv) in Vocareum under ‘/resource/asnlib/publicdata’. The small1.csv is just a test file that you can use to debug your code. For task1, we will only test your code on small2.csv. In this task, you need to build two kinds of market-basket models. Case 1 (1.5 pts): You will calculate the combinations of frequent businesses (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold. You need to create a basket for each user containing the business ids reviewed by this user. If a business was reviewed more than once by a reviewer, we consider this product was rated only once. More specifically, the business ids within each basket are unique. The generated baskets are similar to: user1: [business11, business12, business13, …] user2: [business21, business22, business23, …] user3: [business31, business32, business33, …] Case 2 (1.5 pts): You will calculate the combinations of frequent users (as singletons, pairs, triples, etc.) that are qualified as frequent given a support threshold. You need to create a basket for each business containing the user ids that commented on this business. Similar to case 1, the user ids within each basket are unique. The generated baskets are similar to: business1: [user11, user12, user13, …] business2: [user21, user22, user23, …] business3: [user31, user32, user33, …] Input format: 1. Case number: Integer that specifies the case. 1 for Case 1 and 2 for Case 2. 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension. 4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”. 2. Output file: (1) Intermediate result You should use “Candidates:” as the tag. For each line you should output the candidates of frequent itemsets you found after the first pass of SON Algorithm followed by an empty line after each combination. The printed itemsets must be sorted in lexicographical order (Both user_id and business_id are type of string). (2) Final result You should use “Frequent Itemsets:”as the tag. For each line you should output the final frequent itemsets you found after finishing the SON Algorithm. The format is the same with the intermediate results. The printed itemsets must be sorted in lexicographical order. Here is an example of the output file:Both the intermediate results and final results should be saved in ONE output result file. Execution example: Python: spark-submit task1.py Scala: spark-submit –class task2 hw2.jar Note: Be careful when reading the csv file as spark can read the product id numbers with leading zeros. You can manually format Column F (PRODUCT_ID) to numbers (with zero decimal places) in the csv file before reading it using spark. (1) Data preprocessing You need to save the dataset in CSV format. Figure below shows an example of the output fileFigure: customer_product file Do NOT submit the output file of this data preprocessing step, but your code is allowed to create this file. (2) Apply SON Algorithm The requirements for task 2 are similar to task 1. However, you will test your implementation with the large dataset you just generated. For this purpose, you need to report the total execution time. For this execution time, we take into account the time from reading the file till writing the results to the output file. You are asked to find the candidate and frequent itemsets (similar to the previous task) using the file you just generated. The following are the steps you need to do: 1. Reading the customer_product CSV file in to RDD and then build the case 1 market-basket model 3. Apply the SON Algorithm code to the filtered market-basket model; Input format: 1. Filter threshold: Integer that is used to filter out qualified users 2. Support: Integer that defines the minimum count to qualify as a frequent itemset. 3. Input file path: This is the path to the input file including path, file name and extension. 4. Output file path: This is the path to the output file including path, file name and extension. Output format: 1. Runtime: the total execution time from loading the file till finishing writing the output file You need to print the runtime in the console with the “Duration” tag, e.g., “Duration: 100”. 2. Output file The output file format is the same with task 1. Both the intermediate results and final results should be saved in ONE output result file. Execution example: Python: spark-submit task2.py Scala: spark-submit –class task2 hw2.jar 6. Evaluation Metric Task 1: Input File Case Support Runtime (sec) small2.csv 1 4

$25.00 View

[SOLVED] Dsci 553 foundations and applications of data mining

Assignment 1 1. Overview of the Assignment In assignment 1, you will work on three tasks. The goal of these tasks is to get you familiar with Spark operation types (e.g., transformations and actions) and explore a real-world dataset: Yelp dataset (https://www.yelp.com/dataset). If you have questions about the assignment, please ask on Piazza, which will also help other students. You only need to submit on Vocareum, NO NEED to submit on Blackboard. 2. Requirements 2.1 Programming Requirements a. You must use Python to implement all tasks. You can only use standard python libraries (i.e., external libraries like numpy or pandas are not allowed). There will be a 10% bonus for each task if you also submit a Scala implementation and both your Python and Scala implementations are correct. b. You are required to only use Spark RDD in order to understand Spark operations. You will not get any points if you use Spark DataFrame or DataSet. 2.2 Programming Environment Python 3.6, JDK 1.8, Scala 2.12, and Spark 3.1.2 We will use these library versions to compile and test your code. There will be no point if we cannot run your code on Vocareum. On Vocareum, you can call `spark-submit` located at `/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit`. (Do not use the one at /usr/local/bin/spark-submit (2.3.0)). We use `–executor-memory 4G –driver-memory 4G` on Vocareum for grading. 2.3 Write your own code Do not share code with other students!! For this assignment to be an effective learning experience, you must write your own code! We emphasize this point because you will be able to find Python implementations of some of the required functions on the web. Please do not look for or at any such code! 2.4 What you need to turn in We will grade all submissions on Vocareum, the submissions on blackboard will be ignored. Vocareum produces a submission report after you click the “Submit” button (It takes a while since Vocareum needs to run your code in order to generate the report). Vocareum will only grade Python scripts during the submission phase and it will grade both Python and Scala during the grading phase. a. [REQUIRED]three Python scripts, named: (all lowercase) task1.py, task2.py, task3.py b1. [OPTIONAL, REQUIRED FOR SCALA] three Scala scripts and the output jar file, named: (all lowercase) hw1.jar, task1.scala, task2.scala, task3.scala c. You don’t need to include your results or the datasets. We will grade your code with our testing data (data will be in the same format). 3. Yelp Data In this assignment, you will explore the Yelp dataset. You can find the data on Vocareum under resource/asnlib/publicdata/. The two files business.json and test_review.json are the two files you will work on for this assignment, and they are subsets of the original Yelp Dataset. The submission report you get from Vocareum is for the subsets. For grading, we will use the files from the original Yelp dataset which is SIGNIFICANTLY larger (e.g. review.json can be 5GB). You should make sure your code works well on large datasets as well. 4. Tasks 4.1 Task1: Data Exploration (3 points) You will work on test_review.json, which contains the review information from users, and write a program to automatically answer the following questions: A. The total number of reviews (0.5 point) C. The number of distinct users who wrote reviews (0.5 point) D. The top 10 users who wrote the largest numbers of reviews and the number of reviews they wrote (0.5 point) E. The number of distinct businesses that have been reviewed (0.5 point) F. The top 10 businesses that had the largest numbers of reviews and the number of reviews they had (0.5 point) Input format: (we will use the following command to execute your code) Python: spark-submit –executor-memory 4G –driver-memory 4G task1.py Scala: spark-submit –class task1 –executor-memory 4G –driver-memory 4G hw1.jar Output format: IMPORTANT: Please strictly follow the output format since your code will be graded automatically. a. The output for Questions A/B/C/E will be a number. The output for Questions D/F will be a list, which is sorted by the number of reviews in the descending order. If two user_ids/business_ids have the same number of reviews, please sort the user_ids /business_ids in the alphabetical order. b. You need to write the results in the JSON format file. You must use exactly the same tags (see the red boxes in Figure 2) for answering each question.Figure 1: JSON output structure for task1 4.2 Task2: Partition (2 points) Since processing large volumes of data requires performance decisions, properly partitioning the data for processing is imperative. In this task, you will show the number of partitions for the RDD used for Task 1 Question F and the number of items per partition. Then you need to use a customized partition function to improve the performance of map and reduce tasks. A time duration (for executing Task 1 Question F) comparison between the default partition and the customized partition (RDD built using the partition function) should also be shown in your results. Hint: Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark’s mechanism for redistributing data so that it’s grouped differently across partitions. This typically involves copying data across executors and machines, making the shuffle a complex and costly operation. So, trying to design a partition function to avoid the shuffle will improve the performance a lot. Input format: (we will use the following command to execute your code) Python: spark-submit –executor-memory 4G –driver-memory 4G task2.py Scala: spark-submit –class –executor-memory 4G –driver-memory 4G task2 hw1.jar Output format: A. The output for the number of partitions and execution time will be a number. The output for the number of items per partition will be a list of numbers. B. You need to write the results in a JSON file. You must use exactly the same tags.Figure 3: JSON output structure for task2 4.3 Task3: Exploration on Multiple Datasets (2 points) In task3, you are asked to explore two datasets together containing review information (test_review.json) and business information (business.json) and write a program to answer the following questions: A. What are the average stars for each city? (1 point) 1. (DO NOT use the stars information in the business file). 2. (DO NOT discard records with empty “city” field prior to aggregation). 1. You should store the execution time (start from loading the file) in the json file with tag “m1” and “m2”. 2. Additionally, add a “reason” field and provide a hard-coded explanation for the observed execution times. Method1: Collect all the data, sort in python, and then print the first 10 cities Method2: Sort in Spark, take the first 10 cities, and then print these 10 cities Input format: (we will use the following command to execute your code) Python: spark-submit –executor-memory 4G –driver-memory 4G task3.py Scala: spark-submit –class task3 –executor-memory 4G –driver-memory 4G hw1.jar Output format: a. You need to write the results for Question A as a file. The header (first line) of the file is “city,stars”. The outputs should be sorted by the average stars in descending order. If two cities have the same stars, please sort the cities in the alphabetical order. (see Figure 3 left). b. You also need to write the answer for Question B in a JSON file. You must use exactly the same tags for the task.Figure 3: Question A output file structure (left) and JSON output structure (right) for task3 5. Grading Criteria (% penalty = % penalty of possible points you get) 1. You can use your free 5-day extension separately or together https://forms.gle/h4t46LCahrtDk9rVA 1. This form will record the number of late days you use for each assignment. We will not count late days if no request is submitted. 1. There will be a 10% bonus if you use both Scala and Python and get expected results. 3. All submissions will be graded on the Vocareum. Please strictly follow the format provided, otherwise you can’t get the point even though the answer is correct. You are encouraged to try out your code on Vocareum terminal. 4. We will grade both the correctness and efficiency of your implementation. The efficiency is evaluated by processing time and memory usage. The maximum memory allowed to use is 4G, and maximum processing time is 1800s for grading. The datasets used for grading are larger than the ones that you use for doing the assignment. You will get *% penalty if your implementation cannot generate correctness outputs for large files using 4G memory within the 1800s. Therefore, please make sure your implementation is efficient to process large files. 5. Regrading policy: We can regrade your assignments within seven days once the scores are released. Regrading requests will not be accepted after one week. 7. Only when your results from Python are correct, the bonus of using Scala will be calculated. There is no partial point for Scala. See the example below: Example situations Task Score for Python Score for Scala (10% of previous column if correct) Total Task1 Correct: 3 points Correct: 3 * 10% 3.3 Task1 Wrong: 0 point Correct: 0 * 10% 0.0 Task1 Partially correct: 1.5 points Correct: 1.5 * 10% 1.65 Task1 Partially correct: 1.5 points Wrong: 0 1.5 6. Common problems causing fail submission on Vocareum/FAQ (If your program runs successfully on your local machine but fail on Vocareum, please check these) 1. Try your program on Vocareum terminal. Remember to set python version as python3.6,And use the latest Spark2. Check the input command line formats. 3. Check the output formats, for example, the headers, tags, typos. 4. Check the requirements of sorting the results. 5. Your program scripts should be named as task1.py task2.py etc. 6. Check whether your local environment fits the assignment description, i.e. version, configuration. 8. You are required to only use Spark RDD in order to understand Spark operations more deeply. You willnot get any points if you use Spark DataFrame or DataSet. Don’t import sparksql. 9. Do not use Vocareum for debugging purposes, please debug on your local machine. Vocareum can bevery slow if you use it for debugging. 10. Vocareum is reliable in helping you to check the input and output formats, but its function onchecking the code correctness is limited. It can not guarantee the correctness of the code even with a full score in the submission report. 7. Running Spark on Vocareum We’re going to use Spark 3.1.2 and Scala 2.12 for the assignments and the competition project. Here are the things that you need to do on Vocareum and local machine to run the latest Spark and Scala: On Vocareum: 1. Please select JDK 8 by running the command “export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64” 2. Please use the spark-submit command as “/opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit” On your local machine: 1. Please download and set up spark-3.1.2-bin-hadoop3.2, the setup steps should be the same as spark-2.4.4 2. If you use Scala, please update Scala’s version to 2.12 on IntelliJ. 8. Tutorials for Spark Installation Here are some useful links here to help you get started with the Spark installation. Tutorial for ubuntu: https://phoenixnap.com/kb/install-spark-on-ubuntu Tutorial for windows: https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c Windows Installation without Anaconda (Recommended): https://phoenixnap.com/kb/install-spark-on-windows-10 Tutorial for mac: https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f Tutorial for Linux systems: https://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm Tutorial for using IntelliJ: https://medium.com/@Sushil_Kumar/setting-up-spark-with-scala-development-environment-using-intel lij-idea-b22644f73ef1 Tutorial for Jupyter notebook on Windows: https://bigdata-madesimple.com/guide-to-install-spark-and-use-pyspark-from-jupyter-in-windows/

$25.00 View

[SOLVED] Data-science – nba-archetypes

Task idea Point Guard (PG): Known for having the ball a lot, being the shortest players, getting most of the assists, scoring decent points, mostly scoring by shooting Shooting Guards (SG): Known for being mid sized players scoring a lot of points, shooting the most 3-point shots, getting a decent balance of assists and rebounds Small Forward (SF): Known for being mid sized players with a balance of points, assists, rebounds, and generally excelling on defense. Mostly shooting from mid range Power Forward (PF): Known for being larger players scoring some points with a lot of rebounds, and excelling on defense. Mostly shooting from close range. Center ( C) : Known for being the largest players on the court and for being the best defenders, getting lots of blocks and rebounds. Hardly ever taking 3 point shots, or really for scoring anywhere but from very close range Resources:

$25.00 View

[SOLVED] Data-science – time series project

Project Description Sweet Lift Taxi company has collected historical data on taxi orders at airports. To attract more drivers during peak hours, we need to predict the amount of taxi orders for the next hour. Build a model for such a prediction. The RMSE metric on the test set should not be more than 48. Project instructions 1) Download the data and resample it by one hour. 2) Analyze the data. 3) Train different models with different hyperparameters. The test sample should be 10% of the initial dataset. 4) Test the data using the test sample and provide a conclusion. Data description The dataset is stored in the /datasets/taxi.csv file. The number of orders is in the ‘num_orders’ column.

$25.00 View

[SOLVED] Data-science – supervised learning project

Project Description Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones. We need to predict whether a customer will leave the bank soon. You have the data on clients’ past behavior and termination of contracts with the bank. Build a model with the maximum possible F1 score. To pass the project, you need an F1 score of at least 0.59. Check the F1 for the test set. Additionally, measure the AUC-ROC metric and compare it with the F1. Project Instructions 1) Download and prepare the data. Explain the procedure. 2) Examine the balance of classes. Train the model without taking into account the imbalance. Briefly describe your findings. 3) Improve the quality of the model. Make sure you use at least two approaches to fixing class imbalance. Use the training set to pick the best parameters. Train different models on training and validation sets. Find the best one. Briefly describe your findings. 4) Perform the final testing. Data Description The data can be found in /datasets/Churn.csv file. Features RowNumber — data string index CustomerId — unique customer identifier Surname – surname CreditScore — credit score Geography — country of residence Gender — gender Age — age Tenure — period of maturation for a customer’s fixed deposit (years) Balance — account balance NumOfProducts — number of banking products used by the customer HasCrCard — customer has a credit card IsActiveMember — customer’s activeness EstimatedSalary — estimated salary Target Exited — Ñustomer has left

$25.00 View