Introduction The aim of this assignment is to make sure that you understand and are familiar with the concepts covered in the lectures, including service development, service registration, service deployment, service hosting, service proxy, service binding, service invocation, and application building using your own services and public services. By the end of the assignment, you should have applied these concepts in programming your services, deploying your services, and have used your own services and public services to compose your SOC applications. This project is partly an individual project (75%) and is partly a group project (25%). Each project team will consist of two or three members, based on the team building exercise. Although a team works together to complete the project in a collaborative and coordinated manner, a large part of the project will be done individually. A declaration must be given, which specify the portion of individual efforts in the group part of the project. A percentage of contribution of each member (e.g., 35%, 35%, and 30%) must be given for the joint part of work, which will be used to scale the assignment grades of the group part. Each service must have a single author associated with it. The final project must be deployed into the given Web server. When submitting the services and service test pages into the blackboard, you must submit the folder (folders) with the source code (We will read the source code from Blackboard and test the code from the server). For the server deployment, you can submit folders with the source code or with the precompiled files in the folder. Section I Practice Exercises (No submission required) No submission is required for this part of exercises. However, doing these exercises can help you better understand the concepts and thus help you in quizzes, exams, as well as the assignment questions. 1. Reading: Textbook Chapter 3 and Appendix C. 2. Answer the multiple choice questions 1.1 through 1.16 of the text section 3.10. Study the material covered in these questions can help you prepare for the class exercises, quizzes, and the exams. Answer keys to the questions can be found in the course web page. To better learn these concepts, you should do the exercises based on your understanding, and then check the answer keys. 3. Study for the questions 2 through 16 in text section 3.10. Make sure that you understand these questions and can briefly answer these questions. Study the material covered in these questions can help you prepare for tests and understand the homework assignment. 4. Questions 17 through 19 are largely covered by the assignment questions in Part II. Section II Assignment/Project Questions (Submission required) The purpose of this project is to exercise service development, service deployment, service discovery, remote binding, and application composition using your own services and external public services. Some of the services to be developed can be synthetic, e.g., banking service, while others can be realistic, e.g., en/decoding, en/decryption, and product catalog. However, you should make your services and overall application as realistic as possible. The project will be completed in following parts. Part 1 Requirement Document (group assignment question) [10 points] Part 1 Due by 11:59pm March 28th , 2015. One team member submits the requirement document in Word or PDF into Blackboard. 1 The requirement document will contain the following contents. 1.1 Description of the service-oriented computing system that your team plans to develop. [2] 1.2 A diagram showing the overall system design, its layers, components, and the connections among the services. A sample diagram is given in Figure 1. Your team must come up with your own system. You will not implement the application logic layer of system outlined here in this assignment. It will be implemented in assignment 5. The focus of this assignment is to develop and deploy the services outlined in this requirement document. Thus you must pay more attention to the services in this assignment. [2]Figure 1. A sample of a four-tier service-oriented computing system 1.3 Create a service directory (a table) listing the services that your team and each member plan to develop. Each service must be assigned to one and only one team member, as the service provider. Each service must be described in such detail that shows the feasibility of implementing the service in a reasonable amount of time, e.g., 10 hours. All the services must be related to the application that you outlined in the question above and support the composition of the application. The contents (test in italic) of the table below are the examples and you need fill the table based on your work. You can leave the TryIt column blank for now and fill it out in the final submission. Note that given descriptions are examples only. [6] This page is deployed at: To fill out the address in later submission. This project is developed by: Put your team name here. Provider name Service name, with input and output types TryIt link Service description Planned resources need to implement the service Member 1’s name Encryption and decryption: Input: String Output: String TryIt Cipher encryption and decryption Use library class and local component to implement the service Member 1’s name SolarPower Inputs: zip code and size Output: integer TryIt Output annual KW number for a given panel size at a given zip code location Retrieve information from national database at: https://graphical.weather.gov/xml/Member 1’s name findStore Input: zipcode Output: list of string TryIt Use an existing online service or API to find the locations of a given store name Use the service from Yelp site at: https://www.yelp.com/ Member 2’s name … … … Member 3’s name textToPhone inputs: string Account, string Password, string Receiver, string Subject, string Body Output: boolean TryIt Send a text to a cell phone and return if the message is sent successfully Use google gmail account and carrier services: [email protected] [email protected] [email protected] [email protected] … … … …Part 2 Required Service Development (individual assignment questions) [40 points] Part 2 Due by March 28th, 2015. Individual submissions: Each team member must submit one zip file into Blackboard. The zip file must contain all the required services and their TryIt pages developed in this part of the assignment. 2 This is an individual task within the group assignment. Each team member must independently implement and deploy their services. There are two types of services to develop: required services and elective services. The required services will be developed Part 2 and the elective services will be developed in Part 3.2.1 Required Services [30 = 15+15] A set of required services and their requirements are listed in a separate document named “List of Required Services”. Each team member must choose and implement two services from the given list. The members in the same team must choose different services from the given list. For the elective services in the next section, you will be asked to develop at least one RESTful service. 2.2 TryIt Pages of the Required Services [10] For each service and operation that you developed, you must develop a TryIt test page to allow the human user to test the service. The TryIt test page must contain the following contents: (A) A sentence to describe the functions of each service (operation); (B) The URL of the service 1. For Part 1 submission: use the Localhost URL of the WSDL file. 2. For the final submission in Part 3, the service URL of each service (or its WSDL address) must used in the TryIt page must be the URL in the server. No localhost address can be used. For the required services, you will develop WSDL/SOAP services. (C) Method names, with parameter type list and the return type for each endpoint. (D) Text boxes for entering inputs (E) Invoke buttons to call the services (F) a place (e.g., label) to display the service response (output) The two services and the TryIt page will be tested on the localhost in part 1 submission. To make sure the TA can test your services and TryIt pages on their computer, you must specify the localhost port number statically in the following steps: (1) In Solution Explorer, click the name of the application. (2) In the Properties panel, click the down-arrow beside Use dynamic ports and select False from the dropdown list. This will enable editing of the Port number property. (3) In the Properties panel, click the text box beside Port number and type in a port number. (4) Click outside of the Properties panel. This saves the property settings. Each time you run a file-system Web site within Visual Studio, the ASP.NET Development Server will listen on the specified port. You can combine the test pages of multiple services and operations into one test page. You must make sure that you have a GUI to test every service operation that you developed for credit. An example of 3.E, 3.F, and 3.G is given in the figure and in the link below: https://venus.eas.asu.edu/wsrepository/Services/FileServiceTryIt/More examples of TryIt test page are at: https://venus.eas.asu.edu/WSRepository/AjaxIn/Default.aspx https://venus.eas.asu.edu/WSRepository/Services/ImageVerifier/Tryit.aspx https://venus.eas.asu.edu/WSRepository/Services/RandomString/Tryit.aspx Part 3 Elective Services Development (individual assignment questions) [35 points] Part 3 Due by 11:59pm April 4th , 2015 in Blackboard and in the group account server. Individual submissions: Each team member must submit one zip file into Blackboard. The zip file must contain all the elective services and their TryIt pages developed in this part of the assignment. All the services (required and electives) must be deployed into the WebStrar server. 3 In this part, you will develop the elective services and their TryIt pages. 3.1 Elective Services: Each member must develop at least two elective services (or service operations). For the elective services, the team members must discuss what services are needed based on the scope defined in question 1 and who should develop what services. All team members must define and implement different elective services. The elective services should be related to the scope defined in question 1. One can choose the required services as elective services as long as no other team members are implementing them. If a member chooses to implement the services in the list of required services, this service does not have to be related to the application scope that your team plans to develop. At least one service must be converted to the RESTful service. For the rest of the services, you can choose to develop SOAP services or RESTful services. [25 points for 3.1] The difficulty level of the elective services (operations) that you developed will be rated by the instructor and the teaching assistant into one of the three difficulty levels: (1) Easy: The method (operation) in this service implements a simple math function and can be done using less than 50 lines of code, for example, Fahrenheit and Celsius temperature conversion. [5 points each] (2) Moderate: There are algorithmic issues to address and the code for each method will be at least 50 lines, for example, encryption/decryption service, efficient sorting, and equation system solving. If a service operation can be done in less than 50 lines, but you use more than 50 lines, it will be counted as an easy service. [10 points each] (3) Challenging: Services that will use states, such as creating a simulated (synthetic) banking service that allows users to sign up, create an account, deposit fund, spend fund, etc.; or services that make use of multiple available services (operations) or APIs provided by other providers, such as Microsoft services (e.g., Bing map service), Google code’s APIs, Amazon’s services, or the ASU services. These services and APIs may or may not have WSDL interface. Your services must provide WSDL interfaces. The data received from other services should be processed and combined before returning to the clients. The given required services are examples of challenging services: If you choose to implement these services, they count as challenging services. Database is not allowed in this assignment. [15 each] To obtain the full points in question 3.1, you need to develop at least two services and at most four services. It implies that you cannot write five easy services in this question. If you write easy services only, the maximum points you can obtain in this question is 20. On the other hand, you can obtain 25 points at most, even if you develop more services and more difficult services than required. From the format point of view, you can either define these services methods in one big service, or define them as separate services. Each service and methods must be commented in detail, including the functionality, parameters, types, and the return value type. 3.2 Development of TryIt Pages for Elective Services [5 points for 3.2] You must develop TryIt test page for elective services. The requirement is the same as defined in question 2.2, but all the URLs must be based on the server URLs because the services have been deployed to the server. You must also have the URL of the Main page (See next question) linked to the test pages, so that one can return to the main page from the TryIt page. 3.3 Deployment of Required Services, Elective Services, and TryIt pages [5 points for 3.3] All the services (required and elective) and their TryIt pages must be deployed into WebStrar server. You must deploy the services first. Before deploying the TryIt pages, you must change all the service references from localhost addresses to the server addresses, so that the services can be tested from the server.Part 4 Main Page and Server Deployment (group assignment question) [15 points] Part 4 Due by 11:59pm April 11th, 2015, in Blackboard and into your serve account. One team member submits the requirement document in Word or PDF into Blackboard. 4 This assignment question must be done jointly by the entire group. The main page must be an html page. The main page must be named index.html, so that the page can be accessed through the link: (your server account link). The main page must contain the following contents: 4.1 Service directory (a table) listing the services and links to the test pages (TryIt pages) and services. The schema and a list of example services are shown in the table below, which is similar to the table in the requirement document, except that the TryIt pages must be linked into the page. [8] This page is deployed at: (your group account URL) This project is developed by: Put your team name here. Provider name Service name, with input and output types TryIt link Service description Actual resources used to implement the service Member 1 name Encryption and decryption: Input: String Output: String TryIt Cipher encryption and decryption Use library class and local component to implement the service Member 1 name SolarPower Inputs: zip code and size Output: integer TryIt Output annual KW number for a given panel size at a given zip code location Retrieve information from national database at: https://graphical.weather.gov/xml/Member 1 name findStore Input: zipcode Output: list of string TryIt Use an existing online service or API to find the locations of a given store name Use the service from Yelp site at: https://www.yelp.com/ Member 2 name … … … Member 2 name textToPhone inputs: string Account, string Password, string Receiver, string Subject, string Body Output: boolean TryIt Send a text to a cell phone and return if the message is sent successfully Use google gmail account and carrier services: [email protected] [email protected] [email protected] [email protected] … … … … Make sure that you have the deployment URL in the first row of the table! 4.3 Deploy the table in html into V-Lab server, in the root directory outside the pre-created directories, so that the page can be accessed using your group account server address: [7 points]Grading We will grade each program or service following these steps: (1) We will read the code and give points based on the points allocated to each component, the readability of your code (organization of the code and comments), logic, inclusion of the required functions, and correctness of the implementations of each function. (2) Compile the code. If it does not compile, 40% of the points given in (1) will be deducted. For example, if you are given 20 points in step (1), your points will become 12 if the program fails to compile. (3) If the code passes the compilation, we will execute and test the code. This can be on local machine or on the Web, depends on the assignment specification. If, for any reason, the program gives an incorrect output or crashes for any input, 20% of the points given in (1) will be deducted. Please notice that we will not debug your program to figure out how big or how small the error is. You may lose 40% or 20% of your points for a small error such missing a comma or a space! Late submission deduction policy: For each part of the submission, • No penalty for late submissions that are received within 24 hours of the given deadline; • 1% grade deduction for every hour after the first 24 hours of the grace period!
Introduction The purpose of this project is to make sure that you understand and are familiar with the concepts covered in the lectures, including distributed computing, multithreading, thread definition, creation, management, synchronization, cooperation, event-driven programming, client and server architecture, service execution model of creating a new thread for each request, the performance of parallel computing, and the impact of multi-core processors to multithreading programs with complex coordination and cooperation. Furthermore, you are able to apply these concepts in a programming project. Section I Preparation and Practice Exercises (No submission required) No submission is required for this section of exercises. However, doing these exercises can help you better understand the concepts and thus help you in quizzes, exams, as well as the assignment questions. 1. Reading: Textbook Chapter 2. 2. Answer the multiple choice questions in text section 2.8. Studying the material covered in these questions can help you prepare for the lecture exercises, quizzes, and the exams. 3. Study for the questions 2 through 20 in text section 2.8. Make sure that you understand these questions and can briefly answer these questions. Study the material covered in these questions can help you prepare for the exams and understand the homework assignment. 4. Test the programs given in questions 24 and 25 in text section 2.8. Identify the problems in the program and give correct versions of the programs. 5. If you want solve a more challenging problem in multithreading, you can do question 26 in text section 2.8. 6. Tutorial. To help you complete the project in Section II, you may want go through the tutorial given in the textbook chapter 2, which consists of 6.1 Read the case study in text section 2.6.3. 6.2 Test the program given in the case study. The program can be used as the starting point for your project in Section II. 6.3 Extend the program based on the requirement in Section II. Section II Project (Submission required) Warning: This is a long programming project designed for a study load for three weeks of estimated 3*8 = 24 hours. It is challenging at both conceptual level and implementation level. You must distribute the load in the given three weeks. You would not have enough time to complete it if you start the project only in the last week before the project is due. Purpose of this project is to exercise the concepts learned in this chapter. It is not the purpose of this project to create realistic services and applications. We will create more realistic services and applications in the subsequent projects. In this project, you can use a console application or a simple GUI application to implement the user interface to your program. Description: Consider that you are creating a simplified hotel block booking system that involves hotel retailers (agencies) and wholesalers (hotel chains). The system consists of multiple travel agencies (clients), e.g., hotels.com, hotwire.com, and priceline.com, where they can book blocks of hotel rooms, and multiple hotel chains (servers), e.g., Days Inn, Holiday Inn, and Hilton, that supply blocks of hotel rooms to the agencies. The required architecture and the major components of the system are shown in the diagram below.Figure 1 Architecture of a hotel block booking system An Operation Scenario of the hotel block booking system is outlined below: (1) The HotelSupplier uses a pricing model to calculate the room price. If the new price is lower than the previous price, it emits a (promotional) event and calls the event handlers in the travel agencies that have subscribed to the event. (2) A TravelAgency evaluates the price, generates an OrderObject (consisting of multiple values), and sends the order to the Encoder to convert the order object into a plain string. (3) The Encoder converts the object into a string. (4) The Encoder sends the encoded string back to the caller. (5) The TravelAgency sends the encoded string to one of the free cells in the MultiCellBuffer. (6) The HotelSupplier receives the encoded string from the MultiCellBuffer and sends the string to the Decoder for decoding. (7) The Decoder sends the OrderObject to the HotelSupplier. The decoded object must contain the same values generated by the TravelAgency. (8) The HotelSupplier creates a new thread to process the order; (9) The OrderProcessingThread processes the order, e.g., checks the credit card number and calculates the amount. (10) The OrderProcessingThread sends a confirmation to the travel agency and prints the order (on screen). Components in the diagram are explained in details as follows, with their grades (points) allocation: 1. HotelSupplier1 through HotelSupplierK* are the objects of a class on the server side: Each object has method to be started as a thread by the Main method and will perform a number of functions. It uses a PricingModel to determine the room prices. It defines a price-cut event that can emit an event and call the event handlers in the TravelAgencys if there is a price-cut according to the PricingModel. It receives the orders (in a string) from the MultiCellBuffer. It calls the Decoder to convert the string into the order object. For each order, you can use the existing thread or start a new thread (resulting in multiple threads for processing multiple orders) from OrderProcessing class (or method) to process the order based on the current price. There is a counter p in the HotelSupplier. After p (e.g., p = 10) price cuts have been made, a HotelSupplier thread will terminate. [15 points] *Note 1: For this project assume that K = 2. 2. PricingModel: It can be a class or a method in HotelSupplier class. It decides the price of rooms. It can increase price or decrease the price. You must define a mathematical model (random function is fine) to determine the price based on the order received within a given time period and the number of rooms available in the HotelSupplier in the same time period. You can use a hard-coded table of the price in each week day. However, you must make sure that your model will allow the price goes up some time and goes down some other time. [5 points] 3. OrderProcessing is a class or a method in a class on the supplier’s side. Whenever an order needs to be processed, a new thread is instantiated from this class (or method) to process the order. It will check the validity of the credit card number. You can define your credit card format, for example, the credit card number from the travel agencies must be a number registered to the HotelSupplier, or a number between two given numbers (e.g., between 5000 and 7000). Each OrderProcessing thread will calculate the total amount of charge, e.g., unitPrice*NoOfRooms + Tax + LocationCharge. [10 points] 4. TravelAgency1 through TravelAgencyN, each travel agency is a thread instantiated from the same class (or the same method) in a class. The travel agency’s actions are event-driven. Each travel agency contains a call-back method (event handler) for the HotelSuppliers to call when a price-cut event occurs. The travel agency will calculate the number of rooms to order, for example, based on the need and the difference between the previous price and the current price. The thread will terminate after the HotelSupplier thread has terminated. Each order is an OrderClass object. The object is sent to the Encoder for encoding. The encoded string is sent back to the travel agency. Then, the travel agency will send the order in String format to the MultiCellBuffer. Before sending the order to the MultiCellBuffer, a time stamp must be saved. When the confirmation of order completion is received, the time of the order will be calculated and saved (or printed). You can set N = 5 in your implementation. [15 points] 5. OrderClass is a class that contains at least the following private data members: • senderId: the identity of the sender, you can use thread name or thread id; • cardNo: an integer that represents a credit card number; • receiverID: the identity of the receiver, you can use thread name or a unique name that defined for a hotel supplier; If you are doing an individual project, you do not need this field. • amount: an integer that represents the number of room to order; You must use public methods to set and get the private data members. You must decide if these methods need to be synchronized. The instances created from this class are of the OrderObject. [15 points] 6. MultiCellBuffer class is used for the communication between the travel agencies (clients) and the HotelSupplier (server): This class has n data cells, you can set n=3 for this project. The number of cells available must be less than (
The aim of this assignment is to make sure that you understood the concepts covered in the lectures and in the text, including SOA, SOC, SOD, and their applications in software development. You will also follow a tutorial to obtain hands on experience of developing a simple service and a simple application that uses services. By the end of the assignment, you should have applied these concepts in developing simple services and simple applications that uses remote Web services. Section 1 Practice Exercises (No submission required) Reading: Textbook chapter 1. No submission is required for this section of exercises. However, doing these exercises can help you better understand the concepts and thus help you in writing quizzes and exams. 1. Answer the multiple choice questions in Textbook section 1.7. Compare your answers with the answers given the course web page in the folder “Test and Exam Info”. Doing these exercises will help you prepare your weekly lecture exercises and biweekly quizzes, as scheduled in the course calendar. 2. What are SOA, SOC, SOD, SOE, SOI, and SOSE? Briefly state their definitions based on your understanding. 3. What are the main differences between requirement analyses in the OOC paradigm and in the SOC paradigm? 4. What are the major benefits of separating an application builder from the service providers? 5. What are the main techniques in SOSE (service oriented system engineering)? For each technique, write one or two sentences to describe its purpose. 6. Compare and contrast the traditional software development process and the Service-oriented software development process. For each step of the development, write a paragraph to describe the purposes, responsibilities, functions of the step. 7. What is a service registry? What is a service repository? What are their differences? 8. An electronic travel agency needs to be developed. What is your responsibility if you are: 8.1 a service provider? 8.2 a service broker? 8.3 an application builder? 9. You plan to invent a unique online game. 8.1 Describe what you must do as an application builder and what you can expect the service providers to do for you. 8.2 Describe your invention idea and list everything you must do as an application builder. 8.3 List everything that you can possibly find through service brokers. 10. List a few application areas where you believe SOC is a better fit than OOC. State your reasons and justifications. 11. What are the impacts of SOC paradigm to the IT market and to computer science graduates? Section 2 Tutorials: Using a Web Service in Your Application These tutorials help you to complete the assignment questions in Section 3. If the services are not available, use another given in text appendix C instead. Tutorial 1: Creating a simple Web service This tutorial shows how a service provider develops services for the application builders’ use. Step 1: In Visual Studio’s File menu, choose New Project…, and then choose “WCF Service Application” Template. This step will allow you to create a new Web service.Step 2: Follow textbook, Chapter 3, Section 3.2.1, to develop a simple service using Visual Studio.Tutorial 2: Using WCF Test Client Follow the text Chapter 3, Section 3.2.2 to test your service using WCF test client Tutorial 3: Creating a Windows Forms application to consume the service that you developed in tutorial 1. Follow textbook, Appendix A.1 and Chapter 3 Section 3.6.2 to develop a Windows Forms application. This tutorial shows you how an application builder makes use of remote services to create an application that provides a GUI for accessing Web services. Notice that, in order to test your application, you must have the service started first to make the object an active object! You can start the service by right-clicking the file Service.svc in the project and choose “View in Browser.” Then, you will see the service URL in the browser address bar. Use this URL when you chose “Add Service Reference…”. Tutorial 4: Creating a Web Site application to consume the service that you developed in tutorial 1. Follow textbook, Section 3.6.3 to develop a Web Site application. This tutorial shows you how an application builder makes use of remote services to create an application that provides a Web GUI for accessing Web services. Tutorial 5: Creating an image verifier using the image service Follow the tutorial given in the textbook, Appendix sections A.3 to develop a Web application that verifies if a human user is entering a Web form. An image verifier is frequently used for preventing programmed attacks to a Web site that allows self-registration. Section 3 Web Services (Submission required, 100 points) This section of the assignment is an individual assignment. Each student must perform and submit independent work. 1. Follow the Tutorial 1 given in Section 2 to develop a distance convention Web service. The service contains two operations: [10 points] double m2km(int m); // convert miles kilometers double km2m(int km); // convert kilometers to miles 2. Follow the Tutorial 3 given in Section 2 to develop a Windows Forms Application to consume (access) the distance conversion service. [10 points] 3. Follow the Tutorial 4 given in Section 2 to develop a Web Site Application to consume the distance conversion service. The service must be running on localhost. [10 points]Note: You can develop all three projects in one visual studio solution and submit.4. Follow the tutorial in text Appendix Section A.1 to create a Web browser that can take any URL and display the content of the page in the window. [10 points] In the following questions, you will add more features into the browser that you created in the previous question. Choose two questions only from the following set of questions. If you do more than two, we will grade the first two only. Each question is 30 points. 5. Add text encryption decryption function in your browser. Follow the example in text 3.6.3. However, instead of using the localhost service, you must use the remote service in the ASU Repository at: https://venus.eas.asu.edu/WSRepository/Services/EncryptionWcf/Service.svc [30 points] 6. Add the Get Stock Quote function in your browser. You can use, e.g., the Web service (https://venus.eas.asu.edu/WSRepository/Services/Stockquote/Service.svc) to build your application. For a given stock symbol, e.g., IBM, GOOG, you must display all the values returned in a readable format. [30 points] 7. Follow Tutorial 5 to add the image verifier into your Web browser. [30 points] 8. Find a Web service that convent zip code to map coordinates (latitude, longitude), and add the function into your browser. As reference, you can study these Websites: https://www.ngdc.noaa.gov/dmsp/maps.html and https://graphical.weather.gov/xml/ For example, this URI will return Phoenix weather in an XML file: [30 points] https://forecast.weather.gov/MapClick.php?lat=33.43&lon=-112.04&FcstType=dwml 9. Add currency exchange function in your browser that shows the reference rate of two currencies. You can find currency exchange services from different sites, for example: [30 points] Yahoo finance: https://finance.yahoo.com/d/quotes.csv?s=USDEUR=X&f=sl1d1t1ba&e=.csv Bank: https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml The figure below shows a sample layout of your browser. Notice that the sample components in the sample is different from what is required in this assignment. You must design your own layout to best display the required information. However, all parts of the information must be displayed in a single page. If a particular service is not working, you can use another one with the same level of complexity, for example, the same number of input parameters.Grading We will grade your programs following these steps: (1) We will read your program and give points based on the points allocated to each component, the readability of your code (organization of the code and comments), logic, inclusion of the required functions, and correctness of the implementations of each function. (2) Compile the code. If it does not compile, 40% of the points given in (1) will be deducted. For example, if you are given 20 points in step (1), your points will become 12 if the program fails to compile. (3) If the code passes the compilation, we will execute and test the code. If, for any reason, the program gives an incorrect output or crashes for any input, 20% of the points given in (1) will be deducted. Please notice that we will not debug your program to figure out how big or how small the error is. You may lose 40% or 20% of your points for a small error such missing a comma or a space! Submission Requirement All submissions must be electronically submitted to the assignment folder where you downloaded the assignment paper. All files must be zipped into a single file. Late submission deduction policy: • No penalty for late submissions that are received within 24 hours of the given due date; • 1% grade deduction for every hour after the first 24 hours of the grace period!
1 Theory (50pt) 1.1 Energy Based Models Intuition (15pts) This question tests your intuitive understanding of Energy-based models and their properties. (a) (1pts) How do energy-based models allow for modeling situations where the mapping from input xi to output yi is not 1 to 1, but 1 to many? (b) (2pts) How do energy-based models differ from models that output probabilities? (c) (2pts) How can you use energy function FW (x, y) to calculate a probability p(y | x)? (d) (2pts) What are the roles of the loss function and energy function? (e) (2pts) What problems can be caused by using only positive examples for energy (pushing down energy of correct inputs only)? How can it be avoided? (f) (2pts) Briefly explain the three methods that can be used to shape the energy function. (g) (2pts) Provide an example of a loss function that uses negative examples. The format should be as follows ℓexample(x, y,W) = FW (x, y). (h) (2pts) Say we have an energy function F(x, y) with images x, classification for this image y. Write down the mathematical expression for doing inference given an input x. Now say we have a latent variable z, and our energy is G(x, y, z). What is the expression for doing inference then? 1.2 Negative log-likelihood loss (20 pts) Let’s consider an energy-based model we are training to do classification of input between n classes. FW (x, y) is the energy of input x and class y. We consider n classes: y ∈ {1,…,n}. (i) (2pts) For a given input x, write down an expression for a Gibbs distribution over labels y that this energy-based model specifies. Use β for the constant multiplier. (ii) (5pts) Let’s say for a particular data sample x, we have the label y. Give the expression for the negative log likelihood loss, i.e. negative log likelihood of the correct label (show step-by-step derivation of the loss function from the expression of the previous subproblem). For easier calculations in the following subproblem, multiply the loss by 1 β . 2 (iii) (8pts) Now, derive the gradient of that expression with respect to W (just providing the final expression is not enough). Why can it be intractable to compute it, and how can we get around the intractability? (iv) (5pts) Explain why negative log-likelihood loss pushes the energy of the correct example to negative infinity, and all others to positive infinity, no matter how close the two examples are, resulting in an energy surface with really sharp edges in case of continuous y (this is usually not an issue for discrete y because there’s no distance measure between different classes). 1.3 Comparing Contrastive Loss Functions (15pts) In this problem, we’re going to compare a few contrastive loss functions. We are going to look at the behavior of the gradients, and understand what uses each loss function has. In the following subproblems, m is a margin, m ∈ R, x is input, y is the correct label, y¯ is the incorrect label. Define the loss in the following format: ℓexample(x, y, y¯,W) = FW (x, y). (a) (3pts) Simple loss function is defined as follows: ℓsimple(x, y, y¯,W) = [FW(x, y)] + +[m− FW (x, y¯)] + Assuming we know the derivative ∂FW (x,y) ∂W for any x, y, give an expression for the partial derivative of the ℓsimple with respect to W. (b) (3pts) Log loss is defined as follows: ℓlog(x, y, y¯,W) = log³ 1+ e FW (x,y)−FW (x,y¯) ´ Assuming we know the derivative ∂FW (x,y) ∂W for any x, y, give an expression for the partial derivative of the ℓlog with respect to W. (c) (3pts) Square-Square loss is defined as follows: ℓsquare-square(x, y, y¯,W) = ¡ [FW (x, y)] + ¢2 + ¡ [m− FW (x, y¯)] + ¢2 Assuming we know the derivative ∂FW (x,y) ∂W for any x, y, give an expression for the partial derivative of the ℓsquare-square with respect to W. (d) (6pts) Comparison. (i) (2pts) Explain how NLL loss is different from the three losses above. (ii) (2pts) The hinge loss [FW(x, y)− FW (x, y¯)+ m] + has a margin parameter m, which gives 0 loss when the positive and negative examples have energy that are m apart. The log loss is sometimes called a “softhinge” loss. Why? What is the advantage of using a soft hinge loss? 3 (iii) (2pts) How are the simple loss and square-square loss different from the hinge/log loss? In what situations would you use the simple loss, and in what situations would you use the square-square loss? 2 Implementation (50pt) Please add your solutions to this notebook hw3_impl.ipynb . Plase use your NYU account to access the notebook. The notebook contains parts marked as TODO, where you should put your code or explanations. The notebook is a Google Colab notebook, you should copy it to your drive, add your solutions, and then download and submit it to NYU Classes. You’re also free to run it on any other machine, as long as the version you send us can be run on Google Colab. 4
1 Theory (50pt) 1.1 Attention (13pts) This question tests your intuitive understanding of attention and its property. (a) (1pts) Given queries Q ∈ R d×n , K ∈ R d×m and V ∈ R t×m, what is the output H of the standard dot-product attention? (You can use the softargmaxβ function directly. It is applied to the column of each matrix). 1 (b) (2pts) Explain how the scale β influence the output of the attention? And what β is conveniently to use? (c) (3pts) One advantage of the attention operation is that it is really easy to preserve a value vector v to the output h. Explain in what situation, the outputs preserves the value vectors. Also, what should the scale β be if we just want the attention operation to preserve value vectors. Which of the four types of attention we are referring to? How can this be done when using fully connected architectures? (d) (3pts) On the other hand, the attention operation can also dilute different value vectors v to generate new output h. Explain in what situation the outputs is spread version of the value vectors. Also, what should the scale β be if we just want the attention operation to diffuse as much as possible. Which of the four types of attention we are referring to? How can this be done when using fully connected architectures? (e) (2pts) If we have a small perturbation to one of the k (you could assume the perturbation is a zero-mean Gaussian with small variance, so the new kˆ = k +ϵ), how will the output of the H change? (f) (2pts) If we have a large perturbation that it elongates one key so the kˆ = αk for α > 1, how will the output of the H change? 1.2 Multi-headed Attention (3pts) This question tests your intuitive understanding of Multi-headed Attention and its property. (a) (1pts) Given queries Q ∈ R d×n , K ∈ R d×m and V ∈ R t×m, what is the output H of the standard multi-headed scaled dot-product attention? Assume we have h heads. (b) (2pts) Is there anything similar to multi-headed attention for convolutional networks? Explain why do you think they are similar. (Hint: read the conv1d document from PyTorch: link) 1.3 Self Attention (11pts) This question tests your intuitive understanding of Self Attention and its property. (a) (2pts) Given an input C ∈ R e×n , what is the queries Q, the keys K and the values V and the output H of the standard multi-headed scaled dot-product self-attention? Assume we have h heads. (You can name and define the weight matrices by yourself) 2 (b) (2pts) Explain when we need the positional encoding for self-attention and when we don’t. (You can read about it at link) (c) (2pts) Show us one situation that the self attention layer behaves like an identity layer or permutation layer. (d) (2pts) Show us one situation that the self attention layer behaves like an “running” linear layer (it applies the linear projection to each location). What is the proper name for a “running” linear layer. (e) (3pts) Show us one situation that the self attention layer behaves like an convolution layer with a kernel larger than 1. You can assume we use positional encoding. 1.4 Transformer (15pts) Read the original paper on the Transformer model: “Attention is All You Need” by Vaswani et al. (2017). (a) (3pts) Explain the primary differences between the Transformer architecture and previous sequence-to-sequence models (such as RNNs and LSTMs). (b) (3pts) Explain the concept of self-attention and its importance in the Transformer model. (c) (3pts) Describe the multi-head attention mechanism and its benefits. (d) (3pts) Explain the feed-forward neural networks used in the model and their purpose. (e) (3pts) Describe the layer normalization technique and its use in the Transformer architecture. 1.5 Vision Transformer (8pts) Read the paper on the Transformer model: “An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale”. (a) (2pts) What is the key difference between the Vision Transformer (ViT) and traditional convolutional neural networks (CNNs) in terms of handling input images? Can you spot a convolution layer in the ViT architecture? (b) (2pts) Explain the differences between the Vision Transformer and the Transformer introduced in the original paper. (c) (2pts) What is the role of positional embeddings in the Vision Transformer model, and how do they differ from positional encodings used in the original Transformer architecture? 3 (d) (2pts) How does the Vision Transformer model generate the final classification output? Describe the process and components involved in this step. 2 Implementation (50pt) Please add your solutions to this notebook HW4-VIT-Student.ipynb . Plase use your NYU account to access the notebook. The notebook contains parts marked as TODO, where you should put your code or explanations. The notebook is a Google Colab notebook, you should copy it to your drive, add your solutions, and then download and submit it to NYU Classes. You’re also free to run it on any other machine, as long as the version you send us can be run on Google Colab. 4
This project is intended to help you understand cache coherence and performance of multi-core processors. As with previous projects, for this project you will need VirtualBox and our project virtual machine. Just like in previous projects, you will put your answers in the reddish boxes in this Word document, and then submit it in Canvas (but this time the submitted file name should be PRJ3.docx).In each answer box, you must first provide your answer to the actual question (e.g. a number). You can then use square brackets to provide any explanations that the question is not asking for but that you feel might help us grade your answer. E.g. answer 9.7102 may be entered as 9.7102 [Because 9.71+0.0002 is 9.7102].For questions that are asking “why” and/or “explain”, the correct answer is one that concisely states the cause for what the question is describing, and also states what evidence you have for that. Guesswork, even when entirely correct, will only yield 50% of the points on such questions.Additional files to upload are specified in each part of this document. Do not archive (zip, rar, or anything else) the files when you submit them, except when we explicitly ask you to submit a zip file (in Part 3H). Each file we are asking for should be uploaded separately, and its name should be as specified in this assignment. You will lose up to 20 points for not following the file submission and naming guidelines. Furthermore, if it is not VERY clear which submitted file matches which requested file, we will treat the submission as missing that file. The same is true if you submit multiple files that appear to match the same requested file (e.g. several files with the same name). In short, if there is any ambiguity about which submitted file(s) should be used for grading, the grading will be done as if those ambiguous files were not submitted at all.Most numerical answers should have at least two decimals of precision. Speedups should be computed to at least 4 decimals of precision, using the number of cycles, not the IPC (the IPC reported by report.pl is rounded to only two decimals). You lose points if you round to fewer decimals than required, or if you truncate digits instead of correctly rounding (e.g. a speedup of 3.141592 rounded to four decimals is 3.1416, not as 3.1415).This project can be done either individually or in groups of two students. If doing this project as a two-student group, you can do the simulations and programming work together, but each student is responsible for his/her own project report, and each student will be graded based solely on what that student submits. Finally, no collaboration with other students or anyone else is allowed. If you do have a partner you have to provide his/her name here None (enter None here if no partner) and his/her Canvas username here . Note that this means that you you cannot change partners once you begin working on the project, i.e. if you do any work with a partner you cannot “drop” your partner and submit the project as your own (or start working with someone else) because the collaboration you already had with your (original) partner then becomes unauthorized collaboration.In this part of Project 3 we will be using the LU benchmark. We will also be using a processor with more (sixteen) cores (cmp16-noc.conf). So, for example, to simulate 2-threaded execution you would use a command like this (note the absence of spaces between -n and 256, and between -p and 2):~/sesc/sesc.opt –fAp2-c ~/sesc/confs/cmp16-noc.conf -olu.out -elu.err lu.mipseb -n256–p2To complete this part of the project, run the lu application with 1, 2,and 4 threads. Then fill in the blanks, taking into account all the runs you were asked to do: Note: Parallel speedup is the speedup of parallel execution over the single-thread execution with the same input size. Parallel efficiency is the parallel speedup divided by the number of threads used – ideally, the speedup would be equal to the number of threads used, so the efficiency would be 1. When computing the speedup and efficiency we cannot use IPC or Cycles that are reported for each processor, because these do not account for the cycles where that core was idle (e.g. because the thread was waiting for something to happen). So we need to use the “Sim Time” we get from report.pl because it accounts for all cycles that elapse between the start and completion of the entire benchmark. With more cores, multiplethreads can be allocated to different cores to execute, the overall time for completion is reduced with more cores. Ideally, the parallel efficiency should be 1 for all cases. However, with more cores, once different cores reading and writing same data, different cores mustmaintain the coherence with coherence miss, resulting cycles cost in data sharing and data block transferring. With these overheads, the parallel efficiency can no longer be 1. Since LU factorization is based on matrix calculation, there is high spatial locality within computation. With multiple cores performing reading and writing for same data blocks,there are more data sharing overheads and memory access (considering MSI). Although there is speedup in total execution time, the unit efficiency is dragged down.IPC measures single core efficiency. With multi-core, the behavior of single core needs to be adjusted to guarantee the coherence requirement.Threads can be allocated to other cores to run. With cycles waiting sharing data from other cores, the efficiency of single core is down, leading its IPC is lower than that with uniprocessor.The initial create of different processes is completed on Core 0 and Core 0 is the main core. It is required to assign tasks, perform sys call to create new process, load configuration from system, leading more instructions executed on Core 0. In this part of Project 3, we will be focusing on the number of read misses in the DL1 (Data L1) cache of Core 0, using the same simulations that we already did for Part 1. In the report file generated by the simulator (sesc_lu.mipseb.something, not what you get from report.pl), the number of cache read misses that occur in each DL1 cache (one per processor core) is reported in lines that begin with “P(0)_DL1:readMiss=”.Your answers here should be integer numbers. Since the task is allocated to different threads and different threads now can run on different cores, which is no longer on Core 0. Thus, the decrease of readMiss is proportional to the decrease of work on one Core. With less data required in Core 0, the readMiss will decrease with increase thread number. However, with the decreasingratio of readMiss is also decreasing with more threads running on different cores. The ration of p1/p2 is 2.05, while p2/p4 is 1.62. This is due to the coherence miss od data sharing between cores. With more cores involves, the number of coherence cache miss is expected to increase. You task in this part of the project is to determine how many read misses in each core’s DL1 cache are compulsory (readCompMiss), replacement (capacity or conflict, the counter should be called readReplMiss), and coherence misses (readCoheMiss), and separately also classify write misses (writeCompMiss, writeReplMiss, and writeCoheMiss). Note that this classification is similar to the one you did for Project 2, except that you now we are counting different categories of misses separately for reads (load instructions) and writes (store instructions), that we are placing conflict and capacity misses in the same (replacement) category, and that we are adding a category for coherence misses that we didn’t have in Project 2. To simplify classification, we will not follow the exact definition of coherence misses (“those misses that would have been hits were it not for coherence actions from other cores”). Instead, we will use a definition that allows much simpler implementation: a coherence miss is a miss that finds in the cache a line whose tag matches the block it wants, but that block has a coherence state that prevents such access. In the case of read misses, this means that the line was found in an “Invalid” coherence state.Note that this identification of coherence misses may not be trivial in the SESC simulator because of the way it handles tags during invalidation. If a miss is not a coherence miss, then you can classify it as either compulsory or replacement miss by checking if the block was ever in that cache. When checking whether the miss is a compulsory miss, be careful to track the “was previously in this cache” set of blocks for each cache separately.Note: readMiss numbers here should be the same as those you had in Part 2.
In this assignment, your aim is to extend the definition of a cubic spline to interpolate data that cannot be described by a single-valued function. (a) As a first example, consider the data points listed in the table (below, left) which are sampled from an underlying smooth curve pictured in the plot (below, right): t x y 0 0.0 0.0 1 1.0 3.0 2 2.0 3.0 3 2.0 4.0 4 3.0 5.0Clearly, the curve passes through multiple y-values at x = 2, meaning that the underlying curve is multiply-defined or multi-valued – it’s not a function!! This means that the usual cubic spline approach based on trying to interpolate a function of the form y = f(x) will not work. The trick to dealing with such data is to use a parametric spline that is based on a parametric description of the curve: x = R(t) and y = S(t) for some parameter t. For this example, it’s easiest to assign parameter values t = 0, 1, 2, 3, 4 corresponding to the index of the points in the table. The particular choice of t values is actually not important as long as t is monotonically increasing‡ .Because R(t) and S(t) are now both functions of t, we can interpolate them with two separate splines.§ Based on the t-x-y data from the table, generate two cubic splines R(t) and S(t) using the Matlab spline function with its default not-a-knot end-point conditions. Provide three separate plots: of R versus t, S versus t, and S versus R (the last of which will reproduce the plot above). ‡The parameter t can also be monotonically decreasing – try that out if you’re curious!§Another multiply-defined function you will be familiar with is the unit circle, x 2 + y 2 = 1, which can be written in parametric form as x = R(t) = cos t and y = S(t) = sin t, with parameter 0 6 t 6 2π.(b) You will now use a parametric spline to interpolate the more complicated “four-leaf” curve pictured below. The table lists coordinates of 13 points (xi , yi), i = 0, 1, . . . , 12 that lie along this curve. t x y 0 2.75 -1.0 1 1.3 -0.75 2 -0.25 0.8 3 0.0 2.1 4 0.25 0.8 5 -1.3 -0.25 6 -2.5 0.0 t x y 7 -1.3 0.25 8 0.25 -1.3 9 0.0 -2.5 10 -0.25 -1.3 11 1.3 -0.25 12 2.75 -1.0Determine the parametric spline x = R(t), y = S(t) that interpolates this set of points (x, y), again using Matlab’s spline function. Plot your parametric spline curve and verify (by zooming in on your plot) that the right-most “leaf” is different from the other three leaves in that it is not smooth – the spline endpoints meet at a cusp.(c) For a periodic curve like the one in part (b), it is more appropriate to use periodic end-point conditions instead of not-a-knot conditions. That is, we should take R0 0 (t0) = R0 n−1 (tn) and R00 0 (t0) = R00 n−1 (tn), and similarly for S(t). Use the Matlab code perspline.m posted on Canvas to generate the two periodic cubic splines approximating R(t) and S(t), plot your parametric curve, and compare it to what you obtained in part (b). Verify that the cusp is indeed eliminated.(d) You can now get creative! Start by drawing a periodic parametric curve, which can be any smooth curve of your own design as long as the endpoints meet at the same location. Your curve should be defined by at least 20 spline points (but not more than 40) and there should be at least one instance where your curve crosses itself (such as the four “leaf crossings” in part (b)).To generate your list of points, you may find it helpful to use Matlab’s built-in function ginput (graphical input from mouse) which allows you to draw your parametric curve on the screen by clicking with the mouse at a sequence of locations in a plotting window, and then outputs the x and y coordinates of the “clicked” points.If you type help ginput, you can see that it reads mouse clicks within the plotting window until the “enter” key is pressed, after which it returns two vectors of coordinates. You may find the following sequence of Matlab commands helpful: figure(’position’, get(0,’screensize’)) % largest window possible axes(’position’, [0 0 1 1]) axis square % make x,y-axes equal imshow(’myimage.png’) % display your drawing on-screen [x,y] = ginput; % record mouse clicks until ’Enter’ close % get rid of huge window save mydatafile.mat x y % save x,y data points to a fileThe save command saves your data points to a file that can later on be read back in using the load command. You may then use your (x, y) data as input to perspline.m in the same way you did in part (c) to construct your parametric spline and plot your results.
Consider a horizontal flexible beam of length L that is clamped at one end but free to move vertically along the rest of its length; this is just one example of a cantilever beam that is studied extensively in engineering solid mechanics† . Referring to the picture below, let x be the horizontal coordinate and z(x) be the vertical deflection of the beam. Divide the the beam into n equal sections of length h = L n , defined by points xi = ih for i = 1, 2, . . . , n. A discrete model for the forces and solid deformations along the beam yields a system of linear equations Az = b, where the n × n matrix A has the banded form 9 −4 1 0 · · · · · · 0 −4 6 −4 1 . . . 0 1 −4 6 −4 1 . . . . . . 0 . . . . . . . . . . . . . . . 0 . . . . . . 1 −4 6 −4 1 . . . . . . 1 −4 5 −2 0 · · · · · · 0 1 −2 1 The n-vector b = [bi ] is the given load force acting along the beam (including its own weight) and z = [zi ] is a vector of the corresponding vertical deflections (away from z = 0) caused by this load force; both bi and zi are measured at location xi .Take the beam to be uniformly loaded, which means that a constant upward force F is distributed along the entire beam length, so that each component of the load vector is a constant bi = F · h 4 . For this problem, assume that the values of the physical constants are L = 1.5 m and F = 0.4 N/m. †For more explanation and lots of examples, see http://en.wikipedia.org/wiki/Cantilever.(a) Solve the linear system Az = b using the following three methods: I. The GE+PP algorithm for sparse (banded) linear systems, which is the default algorithm used by Matlab’s “” operator when the matrix (call it Asparse) is of sparse type. You may find it easiest to set up your Asparse matrix using the spdiags command.II. The GE+PP algorithm for dense linear systems, again using “”. Here, you need a dense version of A which you can obtain either with the diag command or more simply by typing Adense = full(Asparse)III. The Gauss-Seidel iterative algorithm, also using the matrix Asparse. You may make use of the gs2 code from lectures, setting the parameters tol = 1e-8 and maxiter = 1e5, and taking initial guess z0 = (1, 1, . . . , 1)T . For each method, compute the solution with n = 20, 50, 100 and 500 points. How do the three methods compare in terms of cost? (use elapsed time from Matlab’s tic / toc as a proxy for cost). How well do your three answers agree with each other? (use the vector 2-norm to compare the differences). Which result is most accurate? Can you explain what is going on with the Gauss-Seidel solution as n increases?(b) Next, verify that the matrix A has a UL factorization (yes, that’s UL and not LU) of the form A = UUT , where U is the upper triangular matrix 2 −2 1 0 · · · 0 0 1 −2 1 . . . . . . 0 . . . . . . . . . . . . 0 . . . . . . 1 −2 1 . . . . . . 1 −2 0 · · · · · · · · · 0 1 For the same values of n in part (a), solve the linear system using the UUT factorization via a sequence of two triangular solves (corresponding to forward and backward substitution). You can use the “” command for both solves, but just make sure that your matrix U is set up as a sparse matrix. Compare your answer to the corresponding solution from Method I (sparse GE+PP) in terms of both accuracy and cost.(c) Compute the condition numbers of both A and U, as well as the spectral radius ρ(T) of the GaussSeidel iteration matrix T. Use this information to help explain your results from parts (a) and (b). The condition number for a large sparse matrix can only be estimated, and so for this purpose you should use the Matlab function condest.(d) Choose what you consider to be your most accurate solution from parts (a)–(b), and use it to generate a plot of the deflection z(x) versus x. In any introductory course in engineering solid mechanics, students learn about the following formula for the maximum deflection of a uniformly load beam which occurs at the free endpoint: zmax = z(L) = F L4 6EI where EI is a material property of the beam called flexural rigidity or bending stiffness (with units N · m2 ). Use your solution for z to estimate the value of bending stiffness.Extra reading: You can find out more about models for beam deflection at http://en.wikipedia.org/wiki/Deflection_(engineering)
The n × n Hilbert matrix H(n) is a matrix with entries defined by hij = Z 1 0 x i+j−2 dx = 1 i + j − 1 for 1 ≤ i, j ≤ n. As I discussed in class, the Hilbert matrix is a “classic” example of an ill-conditioned matrix.The Matlab command hilb(n) generates such a Hilbert matrix with the dimension n passed as an input parameter† . It is easy to show that H(n) is invertible for any n, and so the linear system H(n)x = b is always guaranteed to be solvable for unknowns x = (x1, x2, . . . , xn) T . However, the solution is not easy to compute accurately in floating point arithmetic. This is what you will investigate in this computing assignment.Your report must include the following: (a) First, investigate how the norm and condition number of the Hilbert matrix grow with n. For values of n = 2, 3, 4, . . . , 30: • set up the Hilbert matrix H(n) using hilb • compute the 1-norm and 2-norm of H(n) using norm • compute the condition number of H(n) in the same two norms using condOn the same set of axes plot versus n your four results, kH(n)k1,2 and cond1,2(H(n) ). Make sure to use an appropriate axis scaling‡ . †Use type hilb to see how Matlab sets up the Hilbert matrix in just 2 lines of code. This is a great example of how Matlab’s “vectorized” operations can be used to write clean and simple code that is free of loops. ‡ In other words, think about which plotting function to use: plot, semilogx, semilogy or loglog.(b) It’s possible to show theoretically that the condition number of the Hilbert matrix grows as O(1 + √ 2)4n √ n ! as n → ∞§ (don’t try to show this, just use it). Add this function to your plot from part (a) and discuss whether or not this rate of growth seems consistent with your results. Can you now explain why your choice of axis scaling (and plotting function) from part (a) is the right one?(c) Next, you’ll investigate the performance of three different methods for solving the linear system H(n)x = b, again using values of n ∈ [2, 30]: • Generate the n×n Hilbert matrix H(n) , initialize the exact solution x = (1, 1, . . . , 1)T , and compute the corresponding right hand side vector b = H(n)x.• Compute floating-point approximations ˆx to the linear system H(n)xˆ = b using three methods: the PLU decomposition (followed by forward/backward substitution), Matlab’s backslash command, and the Jacobi iteration. For Jacobi, choose an initial guess x (0) = rand(n, 1), which is an n-vector containing random real numbers chosen from the interval [0, 1].• Compute the absolute solution error Ex = kx − xˆk2 in the 2-norm for each of your three approximations. • Plot your three errors versus the problem dimension n ∈ [2, 30]. What do you observe? Compare and contrast the performance and cost of methods.(d) Finally, recall that in lectures the condition number of a square matrix A is defined in terms of the matrix norm as condp(A) = kAkp · kA−1kp (for p = 1, 2,∞). Investigate how the inverse Hilbert matrix (H(n) ) −1 behaves for large n by doing the following.Use Matlab’s inv function to determine the inverse of H(n) for the same n ∈ [2, 30], and then compute the matrix product M = H(n) · (H(n) ) −1 . Of course, this M should be equal to the n × n identity matrix I, but is it? To measure how close/far M actually is from I, determine the maximum entry of M (in absolute value) and plot this number versus the matrix size n. Discuss your results.Partial code for this assignment can be obtained by downloading the file macm316ca4SampleCode.m from Canvas. You may also use the code jacobi2.m discussed in class to perform your Jacobi iterations. Final note: If you’re intrigued by the Hilbert matrix, you can learn more about it by reading the recent blog post by Nick Higham at It’s interesting how he argues that “the Hilbert matrix is not a good test matrix, for several reasons.” Read the post to find out the reasons why . . . §See http://en.wikipedia.org/wiki/Hilbert_matrix
In this computing assignment, you will extend Newton’s method to use a hybrid approach that combines it with the bisection method. The single initial guess is replaced with an interval that brackets the root, and the bracket is updated as needed using bisecton steps. Consider the following equation 3.06 = (1 − x)(3 + x) 1/3 x(4 − x) 1/2 (1) where Newton’s method alone can easily fail, but a hybrid Newton–bisection algorithm is more reliable – in other words, the hybrid method is more robust.Your report should address the following: (a) Express eq. (1) as a nonlinear root-finding problem of the form f(x) = 0. Plot your function f(x) on the interval [−2, 3] and describe its overall behaviour. Clearly identify the number of roots and their approximate locations.(b) Attempt to approximate the positive root of f(x) using Newton’s method with initial guess x0 = 1, and then repeat using bisection method with initial bracket [0.1, 1.0]. You can use the bisect2.m and newton.m codes from lectures. Discuss your results.(c) Create a modified version of the Newton code (say, newtonb.m) that takes a bracket interval as input instead of a single initial guess. Using the given initial bracket, take a single bisection step to determine x0, which is then used as the initial guess for the Newton iteration. Test your newtonb.m code using the initial bracket [0.1, 1.0].Note: A single initial bisection step is often not sufficient to get Newton’s method to converge, and so some form of bisection must be incorporated inside the Newton iteration as well. The remaining two parts add this feature to your code.(d) Write a utility function called newtBrack.m that takes a single step of Newton’s method and checks whether the new guess for the root lies within the current bracket. The function should have a definition resembling function [ok, xnewt] = newtBrack(a, b, x, fx, fpx) where a and b are the bracket endpoints, x is the previous Newton iterate, and fx and fpx are the values of f(x) and f 0 (x) (not the functions themselves!). The return value xnewt is the new guess for the root using a Newton step. The value of ok is a logical variable (0 = false, 1 = true) that identifies whether xnewt lies within the interval a ≤ x ≤ b. Modify your newtonb code from part (c) to replace the Newton step with a call to newtBrack. The modified newtonb function should continue with the normal Newton iterations unchanged, but print a warning message if ok is “false”.(e) Next, update the iteration in your newtonb function so that it takes a Newton step only if the new approximation xnewt from newtBrack lies inside the current bracket (that is, if ok == 1). If xnewt falls outside the bracket, then print a warning message and take a bisection step using the midpoint from bisection as your next guess instead . . . and don’t forget to update your current bracket [a, b]! Use an absolute error tolerance of 10−10 for your stopping criterion. In your report, give a list of your iterations that you obtain from the updated newtonb code, including all warning messages. Finally, add the approximate root to your plot from part (a).
Motivation: You have a part-time job stocking shelves at a local, family-owned lumber and building supply store. In an effort to replace lost sales revenue during the pandemic, the owner has discovered a market for a value-added product: custom-sized, folding picnic tables to satisfy the huge demand from local restaurants that have recently opened outdoor patios. Knowing that you are an SFU student with expert training in calculus and scientific computing, she has come to you for help in a method for determining how to cut the lumber the correct size for each customer’s special needs.Your boss presents you with the photograph in Figure 1, and a diagram showing the relevant dimensions of the legs identified in Figure 2. The overall table dimensions are determined by the width w and height Figure 1: Photograph of the picnic table design.Figure 2: Diagram of the leg assembly, defining the various geometric parameters. h of the leg assembly, whereas the choice of leg material determines the leg thickness b. The parameters h, w and b all known ahead of time because of customers’ aesthetic and practical considerations (i.e., they are given constants). The remaining measurements for a, c, d1 and d2 indicated in the diagram in Figure 2 must be determined before the legs can be manufactured.Problem set-up: Armed with the promise of a big bonus, you apply your solid grounding in trigonometry to the leg geometry from Figure 2 and derive the following equations relating the various parameters h = 2d2 sin θ, w = 2d2 cos θ + b2 and b = b2 sin θ, (1) where θ represents the angle at which the legs meet the ground. Combining these equations to eliminate d2 yields a single nonlinear equation for the unknown angle θ † : w sin θ = h cos θ + b. (2)The procedure for determining the leg dimensions required in the manufacturing process can now be described as follows. First, solve Eq. (2) for θ. Then, the leg half-length d2 can be computed from the first equation in (1). The other major leg measurement is d1 which can be found from d1 = d2 − a − c, where a and c are written in terms of tangents as a = b tan α and c = b tan θ , with α = π 2 − θ. †As a “check” for correctness, setting b = 0 (a zero-thickness leg) gives h/w = tan θ – as expected!Report specifications: After discussions with your boss, you have come up with the following list of requirements for a report that will earn you your bonus: (a) Express Eq. (2) as a nonlinear root-finding problem of the form f(θ) = 0, whose solution is the leg angle θ. Implement the function in Matlab, by first defining variables corresponding to these table specifications: w = 28, h = 25 and b = 3.5, all measured in inches‡ . Next, define f as a single-line anonymous function§ of the form f = @(theta) …Plot your function f(θ) on the interval θ ∈ [−π, π] and describe its overall behaviour, as well as indicating the number of roots, and their approximate location. Your boss has one special request: she understands that mathematical calculations for angles require working in radians, but her cutting machines are calibrated in degrees. So she asks that all plots and numeric results be reported in degrees instead.(b) Apply three root-finding algorithms to determine the smallest positive root of f(θ) (call it θ ∗ ) to within an error tolerance of 1 degree¶ : • Bisection method, for which you can use the bisect2.m code from lectures. • Fixed-point method, using fixedpt.m. You will first have to derive a suitable fixed-point iteration of the form θk+1 = g(θk) that is equivalent to finding a root of f(θ). • Newton’s method, using newton.m, which requires you to define a second function that computes the derivative f 0(θ). Luckily, you remember the oft-overlooked calculus fact that the usual differentiation formulas for trig functions only hold when the angle is measured in radians. So you need to be sure that your code performs all calculations in radians and does a conversion to degrees for output/plotting purposes. Use a very rough initial guess θ0 = 0 (or bracket [θ0, θ1] = 0, π 2) for each method based on your plot from part (a). Compare the cost of the three methods for approximating θ ∗ in terms of the number of iterations required, and explain any differences you observe.(c) Although it’s not obvious, an exact solution can be found for Eq. (2): cos θ ∗ = 1 (h 2 + w2) −bh + p b 2h 2 + (h 2 + w2)(w2 − b 2) (3) Implement this formula for the exact root θ ∗ in your code (with Matlab’s inverse cosine function, acos) and use it to compare the accuracy of your three approximations from part (b). Explain.Note: I only want you to use Eq. (3) here, not derive it. But if you’re curious to see where it comes from, try squaring both sides of Eq. (2) and applying the identity sin2 θ = 1 − cos2 θ.(d) Using your most accurate approximation from part (b), determine the corresponding estimates of the design parameters d1 and d2. Because customers paying top dollar for these tables demand high quality and workmanship, your boss’s final requirement is that both of leg measurements d1 and d2 must be accurate to within a tolerance of 0.1 inches. Is your solution accurate enough?Finis! Your bonus and a possible promotion to Assistant Manager await! ‡Why aren’t we using metric? Well, lumber is still sold inches and feet, and b = 3.5 inches is the actual dimension of a finished “2-by-4” board that’s commonly used to build picnic tables.§For help on anonymous functions see either the lecture notes, my Whirlwind Tour, or type “doc anonymous”. ¶Hint: Be very careful here. Your angle θ should be measured in degrees, but all trig functions in Matlab expect an argument that’s measured in radians.
Suppose that a Mars rover is wandering in a region which is modeled as a grid of width 12 and height 8, as shown in Fig 1. We do not know the exact location of the rover over time. Instead, we only get some noisy observations about the rover from a sensor. In this lab, we use hidden Markov model to track the rover’s movement over time.The rover’s position at time i = 0, 1, 2, . . . is modeled as a random vector (xi , yi) ∈ {0, 1, . . . , 11} × {0, 1, . . . , 7}. For example, (x2, y2) = (5, 4) means that at time step 2, the rover is in column 5, row 4 illustrated as a blue circle in Fig 1. The movement of the rover is quite predictable. At each time step, it makes one of the five actions: it stays put, goes left, goes up, goes right, or goes down. 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 A Figure 1: A wandering rover (blue circle) in a grid of width 12 and height 8.The action of the rover at any time i depends on its previous action as well as its current location. Given that the rover’s current location is not at the boundary of the grid, its action is simple: if the rover’s previous action was a movement (left, up, right, or down), the rover moves in the same direction with probability 0.9 and stays put with probability 0.1; if the rover’s previous action was to stay, it stays again with probability 0.2, and moves with each direction (left, up, right, or down) with probability 0.2. The rover’s action can be shown by a transition diagram in Fig. 2a.In the case that the rover is on the boundary of the grid, the rover’s behavior should adjust such that it will not go outside the region, and meanwhile the behavior should also be consistent with the non-boundary case.For example, when the rover is in location A in Fig. 1, the rover cannot go higher. Therefore, there are only four possible actions: it stays, goes left, goes right, or goes down. Based on its previous action, those probabilities may need to be re-normalized such that they sum to 1.Specifically, if the rover’s previous action was to go left, the rover moves in the same direction with probability 0.9 and stays put with probability 0.1; if the rover’s previous action was to go up, it stays at A with probability 1 (due to re-normalization); if the rover’s previous action was to stay, it stays again with probability 0.25, and moves with each direction (left, right, or down) with probability 0.25 (due to re-normalization). The resulting transition diagram is depicted in Fig. 2b.Since the rover’s behavior at any time i depends on its previous action as well as its current location, we model the rover’s hidden state zi at time i as a super variable that includes both the rover’s location (xi , yi) and its most recent action ai , i.e., zi = ((xi , yi), ai), where ai is a random variable that takes the value from 1 left right down up stay 0.2 0.2 0.2 0.2 0.2 0.1 0.1 0.9 0.1 0.1 0.9 0.9 0.9(a) Transition diagram if the rover is not on the boundary left right down up stay 1/4 1/4 1/4 1/4 1 0.9 0.1 0.1 0.9(b) Transition diagram if the rover is at A Figure 2: Transition diagrams of rover’s behavior {stay, left, up, right, down}. Pay attention that although we use subscript i in ai , it actually represents the action of the rover at time i − 1 (most recent action).(a) Hidden state of the rover, blue circle indicates the current location (xi, yi), and the arrow indicates the most recent action ai. (b) Observation (ˆxi, yˆi), whose value is taken from one of the five possible locations (green circles) with equal probability, given the true location shown in the left figure. Figure 3: Hidden state and observation.We do not directly observe the rover’s hidden state zi = ((xi , yi), ai) as shown in Fig. 3a. Instead, we have access to a noisy sensor that puts a uniform distribution on the valid grid positions within one grid cell of the rover’s current true position, as shown in Fig. 3b. Note that when the rover is on the boundary, the possible locations of the observation should adjust such that the observation is in the region. We do not directly observe the rover’s action from the sensor, either. In words, at time i the observation is represented by random variable (ˆxi , yˆi) ∈ {0, 1, . . . , 11} × {0, 1, . . . , 7}, and (ˆxi , yˆi) is uniformly distributed over the possible locations determined by zi . Lastly, we assume that the rover’s initial position (x0, y0) is equally likely to be any of the grid locations, and its initial action a0 is stay.Download hmm.zip under Files/Labs/Lab2 Part2/ on Quercus and unzip the file. File rover.py contains functions for generating the initial distribution, the transition probabilities given a current hidden state, and the observation probabilities given a current hidden state. Thus you do not need to re-write these. File test.txt contains the data for Question 1. File test missing.txt contains the data for Questions 2, 3, 4, and 5. In both test.txt and test missing.txt, the first three columns correspond to the hidden states, and the last two columns correspond to the observations.To help you understand what the code does, we provide a visualization tool in inference.py, which can be turned on by setting enable graphics to True. Note that whether you use the visualization will not affect the grading. When you run inference.py, three panes will pop up. The left pane shows the true state of the rover, including location and the most recent action, represented by an arrow.The middle pane shows the observed position of the rover, with red signifying the missing data. The right pane shows the estimated state of the rover as well as the marginal distribution by grey level. After you implement your inference algorithms, the right pane will automatically show the results.Please follow the questions below and complete the file inference.py. This is the only file you need to modify.1. (a) Write down the formulas of the forward-backward algorithm to compute the marginal distribution p(zi |(ˆx0, yˆ0), . . . ,(ˆxN−1, yˆN−1)) for i = 0, 1, . . . , N − 1. The formulas include the initializations of the forward and backward messages, the recursion relations of the messages, and the computation of the marginal distribution based on the messages.(b) Complete function forward backward in file inference.py to implement the forward-backward algorithm. Now use the data (ˆx0, yˆ0), . . . ,(ˆx99, yˆ99) in test.txt to determine the marginal distribution of zi at time i = 99, i.e., p(z99|(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)). Only include states with non-zero probability in your answer.Sanity check: The marginal distribution of the state at time i = 1 is p(z1|(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)) = ( 0.5 if z1 = ((6, 5), down), 0.5 if z1 = ((6, 5),right). (1)2. Some of the observations were lost when they were transmitted from Mars to earth. Modify function forward backward so that it can handle missing observations. In a list of observations, a missing observation is represented by None in inference.py. Now use the data in test missing.txt to determine the marginal distribution at time i = 30, i.e., p(z30|(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)) with missing observations. Only include states with non-zero probability in your answer.Sanity check: The mode of this marginal distribution p(z30|(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)) should be ((6,7),right) with probability 0.91304.3. (a) Instead of computing the marginal distributions, we now seek the most likely sequence of the states via the Viterbi algorithm. Write down the formulas of the Viterbi algorithm using zi and (ˆxi , yˆi), i = 0, . . . , N −1. The formulas include the initialization of the messages and the recursion of the messages in the Viterbi algorithm.(b) Complete the function Viterbi in file inference.py to implement the Viterbi algorithm. Your implementation should be able to handle missing observations. Use the data in test missing.txt to determine the last 10 hidden states of the most likely sequence (i.e., i = 90, 91, 92, . . . , 99) based on the MAP estimate obtained from the algorithm.Sanity check: For the MAP sequence, the last 3 states, i.e., the states at i = 97, 98, 99 are: ((8,7),left), ((7,7),left), ((6,7),left).4. Let z˜i , i = 0, 1, . . . , 99 be the estimate obtained from Question 3 by Viterbi algorithm, which corresponds to the most probable sequence given the observations: {z˜0, . . . , z˜99} = arg max z0,…,z99 p(z0, . . . , z99|(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)). (2) 3Let zˇi , i = 0, 1, . . . , 99 be the set of the states obtained from Question 2 by forward-backward algorithm, which are individually the most probable at each time step, corresponding to the maximization of the marginal distribution in Question 2, i.e., zˇi = arg max zi p(zi |(ˆx0, yˆ0), . . . ,(ˆx99, yˆ99)), i = 0, . . . , 99. (3) To compare z˜i and zˇi , we let z˙ i , i = 0, 1, . . . , 99 be the true hidden states, and define the error probabilities of z˜i and zˇi , respectively, as P˜ e = 1 − P99 i=0 I(z˜i = z˙ i) 100 , (4) Pˇ e = 1 − P99 i=0 I(zˇi = z˙ i) 100 , (5) where I(·) is the indicator function, i.e., I(X) = 1, if X is true; otherwise I(X) = 0. Please compute and compare P˜ e and Pˇ e for the data in test missing.txt.5. Although the states zˇi , i = 0, 1, . . . , 99, obtained from (3), are individually the most probable states, the sequence zˇ0, zˇ1, . . . , zˇ99 may not representant a valid sequence. By a valid sequence, we mean that the rover can behave physically as the sequence zˇ0, zˇ1, . . . , zˇ99 as described by the Markov transition diagram of Fig. 2. For example, . . . (6) zˇi = ((1,1),stay) (7) zˇi+1 = ((1,1),left) (8) . . . (9) is not a valid sequence because the transition from state ((1,1),stay) at time step i to state ((1,1),left) at time step i+ 1 is impossible according to the transition model. Please check zˇ0, zˇ1, . . . , zˇ99 in Question 4 to see whether or not it is a valid sequence. If not, please find a small segment zˇi , zˇi+1 that violates the transition model for some time step i. We thank Prof. Greg Wornell of MIT in creating this lab.
In this lab, we use Bayesian regression to fit a linear model. Consider a linear model of the form z = a1x + a0 + w, (1) where x is the scale input variable, and a = (a0, a1) T is the vector-valued parameter with unknown entries a0, a1, and w is the additive Gaussian noise: w ∼ N (0, σ2 ), (2) where σ 2 is a known parameter.Suppose that we have access to a training data set containing N samples {x1, z1}, {x2, z2}, . . . , {xN , zN }. We aim to estimate the parameter a by finding its posterior distribution. When the training finishes, we make predictions based on new inputs.We consider a Bayesian approach, which models the parameter a as a zero mean isotropic Gaussian random vector whose probability distribution is expressed as p (a) = N 0 0 , β 0 0 β , (3) where β is a known hyperparameter.Download reg.zip under Files/Labs/Lab2 Part1/ on Quercus and unzip the file. File training.txt contains the training data: the first column is the inputs; the second column is the targets. The training data is generated from z = −0.5x−0.1+w. Please answer the questions below and complete regression.py. File util.py contains a few useful functions.1. Express the posterior distribution p(a|x1, z1, . . . , xN , zN ) using σ 2 , β, x1, z1, x2, z2, . . . , xN , zN .2. Let σ 2 = 0.1 and β = 1. Based on the posterior distribution obtained in the last question, draw four contour plots corresponding to p(a), p(a|x1, z1), p(a|x1, z1, . . . , x5, z5), and p(a|x1, z1, . . . , x100, z100). In all contour plots, the x-axis represents a0, and the y-axis represents a1. The range is set as [−1, 1]× [−1, 1]. In each figure, also draw the true value of a, which corresponds to the point (−0.1, −0.5).3. Suppose that there is a new input x, for which we want to predict the target value z. Write down the distribution of the prediction z, i.e., p(z|x, x1, z1, . . . , xN , zN ).4. Let σ 2 = 0.1 and β = 1. Suppose that the set of the new inputs is {−4, −3.8, −3.6, . . . , 0, . . . , 3.6, 3.8, 4}. Plot three figures corresponding to the following three cases: (a) The predictions are based on one training sample, i.e., based on p(z|x, x1, z1). (b) The predictions are based on 5 training samples, i.e., based on p(z|x, x1, z1, . . . , x5, z5). (c) The predictions are based on 100 training samples, i.e., based on p(z|x, x1, z1, . . . , x100, z100).In all figures, the x-axis is the input, the y-axis is the target, and the range is set as [−4, 4] × [−4, 4]. Each figure should contain three components: 1) the new inputs and the predicted targets; 2) a vertical interval at each predicted target, indicating the range within one standard deviation; 3) the training sample(s) that are used for the prediction. Use plt.errorbar for 1) and 2); use plt.scatter for 3).References: C. M. Bishop, Pattern Recognition and Machine Learning, Springer New York, 2006, pp. 152– 159. & K. Murphy, Machine Learning: A Probabilistic Approach, MIT Press, 2012, pp. 231–234.
In the first part of the lab, we use a Na¨ıve Bayes Classifier to build a spam email filter based on whether and how many times each word in a fixed vocabulary occurs in the email. Suppose that we need to classify a set of N emails, and each email n is represented by {xn, yn}, n = 1, 2, . . . , N, where yn is the class label which takes the value yn = ( 1 if email n is spam, 0 if email n is non-spam (also called ham), (1) and xn is a feature vector of the email n. We use a multinomial model to construct the feature vector xn.Let W = {w1, w2, . . . , wD} be the set of the words (called the vocabulary) that appear at least once in the training set. The feature vector xn is defined as a D-dimensional vector xn = [xn1, xn2, . . . , xnD], where each entry xnd, d = 1, 2, . . . , D is the number of occurrences of word wd in email n. Thus the total number of words in email n can be expressed as ln = xn1 + xn2 + . . . + xnD.We assume that each email n of length ln is generated by a sequence of ln independent events that randomly draw words from the vocabulary W. (This is known as the na¨ıve Bayes assumption.) For each event, let p(wd | yn = 1) be the probability that word wd is picked, given that the email belongs to spam; let p(wd | yn = 0) be the probability that word wd is picked, given that the email belongs to ham.Note that p(wd | yn = 1) and p(wd | yn = 0) are different, which gives us a way to classify spam vs. ham. For example, words like “dollar”,“winner” would be more likely to occur in spam than in ham. Also, note that both p(wd | yn = 1), d = 1, 2, . . . , D and p(wd | yn = 0), d = 1, 2, . . . , D should sum to one, i.e., X D d=1 p(wd | yn = 1) = 1, (2) X D d=1 p(wd | yn = 0) = 1. (3)The probabilities p(wd | yn = 1), p(wd | yn = 0), d = 1, . . . , D should be learned from the training data. We make use of the word frequencies to model each email n probabilistically. Since each word in the email is seen as independently drawn from the vocabulary W, the distribution of the feature vector xn given label yn can be seen as a multinomial distribution as follows, p (xn | yn) = (xn1 + xn2 + . . . + xnD)! (xn1)!(xn2)! . . .(xnD)! Y D d=1 p (wd | yn) xnd . (4) We assume that the prior class distribution p(yn) is modeled as p(yn = 1) = π, (5) p(yn = 0) = 1 − π, (6) where π is a fixed parameter (e.g., 0.5).In the following, we first estimate the probabilities p(wd | yn = 1), p(wd | yn = 0), d = 1, . . . , D using the training set; we then build a classifier based on Bayes’ rule and make predictions on the testing set. Download classifier.zip under Modules/Lab1/ on Quercus and unzip the file. The spam emails for training are in the subfolder /data/spam/. The ham emails for training are in the subfolder /data/ham/. The unlabeled emails for testing are in the subfolder /data/testing/.Please answer the questions below and complete the routine classifier.py. File util.py contains a few functions/classes that will be helpful in writing the code for the classifier.1. Training. We estimate the conditional probability distribution of the D-ary random variable as specified by p(wd | yn = 1) and p(wd | yn = 0), d = 1, . . . , D, from the training data using a bag-of-words model as follows. For notational simplicity, we define pd = p(wd | yn = 1) and qd = p(wd | yn = 0). (a) We put all the words from the spam emails in the training set in a bag and simply count the number of occurrences of each word wd, d = 1, · · · , D. We do the same for ham emails. The maximum likelihood estimates of pd and qd based on these counts are not the most appropriate to use when the probabilities are very close to 0 or to 1. For example, some words that occur in one class may not occur at all in the other class. In this problem, we use the technique of “Laplace smoothing” to deal with this problem. Please write down such an estimator for pd and qd as functions of the training data {xn, yn}, n = 1, 2, . . . , N using Laplace smoothing for the D-ary random variable.(b) Complete the function learn distributions in file classifier.py. In learn distributions, you first build the vocabulary {w1, . . . , wD} by accounting for all the words that appear in the training set at least once; you then estimate pd and qd, d = 1, 2, . . . , D using your expressions in part (a). 2. Testing. We classify the unlabeled emails in /data/testing/ using the trained classifier. (a) Let {x, y} be a data point from the testing set whose class label y is unknown. Write down the maximum a posterior (MAP) rule to decide whether y = 1 or y = 0 based on the feature vector x. The d-th entry of x is denoted by xd. Please incorporate pd and qd in your expression. Please think carefully how to treat words that do not appear in either ham or spam training sets. Please assume that π = 0.5.(b) Complete the function classify new email in file classifier.py to implement the MAP rule, and run it on the testing set. There are two types of errors in classifying unlabeled emails: Type 1 error is defined as the event that a spam email is misclassified as ham; Type 2 error is defined as the event that a ham email is misclassified as spam. Write down the numerical values of these two numbers of errors made by your classifier on the testing data. To avoid numerical underflow in your code, please work with the log probability log p(y|x) in your code.(c) In practice, Type 1 error and Type 2 error lead to difference consequences (or costs). Therefore, we may wish to trade off one type of error against the other in designing the classifier. For example, we usually want to achieve a very low Type 2 error since the cost of missing a useful email can be severe, while we can tolerate a relative high Type 1 error as it merely causes inconvenience.Please provide a way to modify the decision rule in the classifier such that these two types of error can be traded off. In other words, change the decision rule in a way such that Type 2 error would decrease at a cost of Type 1 error, and vice versa. Test your method on the testing set and provide the following plot: Let the x-axis be the number of Type 1 errors and the y-axis be the number of Type 2 errors in the testing data set. Plot at least 10 points corresponding to different pairs of Type 1 and Type 2 errors, as a result of adjusting the classification rule. The two end points of the plot should be: 1) the one with zero Type 1 error; and 2) the one with zero Type 2 error. The code should be included in file classifier.py.(d) Why do we need Laplace smoothing? Briefly explain what would go wrong if we use maximum likelihood estimation in the training process, by considering a scenario in which a testing email contains both a word w1 that appears only in the ham training set (but not in the spam training set), and a word w2 that appears only in the spam training set (but not in the ham training set). How does Laplace smoothing resolve this issue? The training and test data for this problem are taken from V. Metsis, I. Androutsopoulos and G. Paliouras, “Spam Filtering with Naive Bayes – Which Naive Bayes?” Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006.When the feature vector is real-valued (instead of binary), a Gaussian vector model is appropriate. In this part of the lab, we use linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) for the height/weight data of people, and visualize the classification results of male and female persons based on height and weight.Suppose that the data set contains N samples. Let xn = [hn, wn] be the feature vector, where hn denotes the height and wn denotes the weight of a person indexed by n. Let yn denote the class label. Here yn = 1 is male, and yn = 2 is female. We model the class prior as p(yn = 1) = π and p(yn = 2) = 1 − π. For this problem, let π = 0.5.For the class conditional distributions, let µm be the mean of xn if class label yn is male, and let µf be the mean of xn if class label yn is female. For LDA, a common covariance matrix is shared by both classes, which is denoted by Σ; for QDA, different covariance matrices are used for male and female, which are denoted by Σm and Σf , respectively.Download ldaqda.zip under Modules/Lab1/ on Quercus and unzip the file. The data set for training is in file trainHeightWeight.txt, whereas the data set for testing is in file testHeightWeight.txt. Each file uses the same format to represent the data: the first column corresponds to the class labels, the second column corresponds to the heights, and the third column corresponds to the weights.Please answer the questions below and complete function ldaqda.py. File util.py contains a few functions/classes that will be useful in writing the code.1. Training and visualization. We estimate the parameters in LDA and QDA from the training data in trainHeightWeight.txt and visualize the LDA/QDA model. (a) Please write down the maximum likelihood estimates of the parameters µm, µf , Σ, Σm, and Σf as functions of the training data {xn, yn}, n = 1, 2, . . . , N. The indicator function I(·) may be useful in your expressions.(b) Once the above parameters are obtained, you can design a classifier to make a decision on the class label y of the new data x. The decision boundary can be written as a linear equation of x in the case of LDA, and a quadratic equation of x in the case of QDA. Please write down the expressions of these two boundaries.(c) Complete function discrimAnalysis in file ldaqda.py to visualize LDA and QDA. Please plot one figure for LDA and one figure for QDA. In both plots, the horizontal axis is the height with range [50, 80] and the vertical axis is the weight with range [80, 280]. Each figure should contain: 1) N colored data points {xn, n = 1, 2, . . . , N} with the color indicating the corresponding class labels(e.g., blue represents male and red represents female); 2) the contours of the the conditional Gaussian distribution for each class (To create a contour plot, you need first build a two-dimensional grid for the range [50, 80] × [80, 280] by using function np.meshgrid. You then compute the conditional Gaussian density at each point in the grid for each class. Finally use function plt.contour, which takes the two-dimensional grid and the conditional Gaussian density on the grid as inputs to automatically produce the contours.); 3) the decision boundary, which can also be created by using plt.contour with appropriate contour level.2. Testing. We test the obtained LDA/QDA model on the testing data in testHeightWeight.txt. Complete function misRate in file ldaqda.py to compute the misclassification rates for LDA and QDA, defined as the total percentage of the misclassified samples (both male and female) over all samples. The data for this problem are taken from: K. Murphy, Machine Learning: A Probabilistic Approach, MIT Press, 2012.
In this app, you will modify the payroll system code provided to you with this assignment on Blackboard in the zipped file named Starter Code. Using code examples in 10. Object-Oriented Programming – Polymorphism and Interfaces slides, this should not be too complicated.• Add a private instance variable birthDate to the Employee class. • Use class Date – also provided to you with this assignment on Blackboard – to represent an employee’s birthday. • Add get methods to class Date. • Assume that payroll is processed once per month.• Create an array of Employee variables to store references to the various employee objects. • In a loop, calculate the payroll for each Employee (polymorphically), and add a $100.00 bonus to the person’s payroll amount if the current month is the one in which the Employee’s birthday occurs. • Use the sample output provided to test your app.Submit your .java files on Blackboard as before. DO NOT INCLUDE THE OUTPUT, .java~ or .class files!
Objective This assignment’s objective is to write a console-based Java application with which a travel agent could present options for travel destinations to a client who wants to redeem his or her accumulated frequent flyer miles.For the sake of simplicity, we will assume all trips depart from O’Hare International Airport and that the miles represented are required for a roundtrip ticket. Depending on the distance, each destination requires a different number of frequent flyer miles to obtain a roundtrip ticket. Note that, if the client travels during the “off season”, he or she may be able to take advantage of the “off season”, or “super-saver”, mileage of a destination that requires fewer miles to obtain a roundtrip ticket to that particular destination.Input File A list of destination cities and related ticket redemption information will be read from a file (one named destinations.txt is provided for you but you may create your own). The file is formatted with one city, or destination, per line, with the first five items, or fields, separated by a semicolon ( ; ) and the last two separated by a hyphen ( – ) as shown in the example below.The six items regarding a single destination represent: The destination city name; the normal mileage needed for an economy class ticket; the “super-saver” mileage needed for an economy class ticket; the additional mileage needed for upgrading from economy to first class; the beginning month of departure during which “super-saver” mileage can be used instead of the normal mileage – the ending month of departure during which “super-saver”, mileage can be used instead of the normal mileage. Actual Input File: Berlin;17250;12750;5250;3-5 Hong Kong;30000;20250;17500;2-4 Hyderabad;30000;20000;15000;2-4 New York;10000;5000;8000;10-11 Paris;20000;15000;10000;2-4 Sidney;50000;30000;20000;5-6 Tokyo;29000;16500;5350;2-5 Vienna;17500;12500;5000;2-4 Washington, D.C.;9000;7500;2000;10-12 The above are the contents of file destinations.txt as attached to the Assignment on Blackboard.Your algorithm should 1) try to get tickets that travel the farthest, 2) use “super-saver” mileage whenever possible, 3) try to display as many different tickets as possible given the information, and, only after you have determined the destinations to which the customer can fly, 4) determine if you can use the remaining mileage for upgrade for any of the destinations to which the customer is eligible to fly (try to upgrade the longest trip first, then the next-to-longest – if there is one, etc.).The Classes Destination Write a class that encapsulates information such as the name of the destination city; normal miles required for a ticket; “super-saver” miles required to for a ticket during the “off season” months; additional miles for upgrading from economy to first class; start month of the “off season” and end month of the “off season”. Private instance variables and public accessor methods should be written for this information. MileRedeemer Write a class to encapsulate the logic for redeeming mileage.This class should have private instance variables for an array of Destination objects, and an integer to represent the remaining miles after the user’s Frequent Flyer Miles have been redeemed. Define the following methods in the MileRedeemer class: • public void readDestinations(Scanner fileScanner) For this method, we use a Scanner object as the input parameter for flexibility and reusability. For example, we could reuse the method to read files from different sources. This method should use the Scanner object to read and parse the destination data into an array of Destination objects.Before ending, the method should sort the array of Destination objects in descending order by normal mileage (see more information about sorting below). • public String[] getCityNames() This method should loop through the array of Destination objects and create an array of String objects from the city names. This array can be sorted in ascending order and returned (to be printed out by the main app) just for the display of all possible destinations. • public String[] redeemMiles(int miles, int month) For this method, miles is the total available miles, and month is the desired month of departure.To avoid writing one huge method, you can (and probably should) have the redeemMiles() method call some other methods to accomplish subtasks as part of the larger overall algorithm. This method should return an array of String objects containing descriptions of redeemed tickets to be printed out by the main app. It should also save the miles remaining after the tickets have been redeemed. • public int getRemainingMiles( ) This method should return the saved remaining miles. MileRedeemerApp This is the main app class that will have the main( ) method and will drive the entire process. First declare a Scanner object for the keyboard and prompt the travel agent for the name of the .txt file of destinations. Next, it should declare another Scanner object to read the destination records from thefile. This Scanner will be the Scanner parameter passed to the readDestinations() method of the MileRedeemer class where the array of destinations will be created. Once back from readDestinations(), use getCityNames() to get and print the list of all possible destinations. Prompt the travel agent for the client’s Frequent Flyer Miles balance and for the client’s month of departure. Using these two integers, call redeemMiles() to determine the destinations available to which the client can fly and present them to the travel agent. Note that, if redeemMiles() returns an empty array, print an appropriate message such as: *** Your client has not accumulated enough Frequent Flyer Miles *** Either way, then use getRemainingMiles() to present the Frequent Flyer Miles remaining after determining the destinations to which the client can fly. After presenting the possible destinations and remaining miles, prompt the travel agent asking if he or she would like to start over.Do not ask for a new file name, do not print the header statement and do not print the list of all possible destinations again. Let the travel agent answer with ‘y’ or ‘Y’ or ‘yes’ or ‘Yes’ and, if he or she answers in the affirmative, the next thing he or she should see is a prompt to enter the client’s Frequent Flyer Miles and departure month and the process of determining a new list of possible destinations. MileageComparator Because the cities in the input text file are almost always out of order based on normal miles, the readDestinations() method will need to sort them in descending normal miles order before any redemptions can be done. The class java.util.Arrays has a static sort() method that takes an array and an implementation of Comparator. The latter is an object of a class that defines the ordering: import java.util.Comparator; public class MileageComparator implements Comparator { @Override public int compare(Destination d1, Destination d2) { return (d2.getNormalMiles() – d1.getNormalMiles()); } } Note that this class is optional as you may be able to find a different way to implement the Java interface Comparator. For example, create an inner class. Note: The array of city name strings created and returned by getCityNames() can also be sorted using a version of Arrays.sort(). There’s no need to write a Comparator in this case, though, as the “natural ordering” of the strings will be sufficient. (continued)Programming Notes Exception Handling and Input Validation For simplicity’s sake, please do not add try catch blocks for exception handling and do not do any sort of input validation. File I/O and String Parsing The Scanner class can be used to read the lines from the file. Before reading the lines, a java.util.ArrayList should be created: ArrayList destinationList = new ArrayList(); As lines are read from the input file, the lines need to be parsed, or broken into fields, using “;” as the delimiter. Instances of Destination are created and added to the array list. There are a variety of different ways to parse a String into fields. One easy way is to use the split() method of the String class. Note that the last two fields, i.e., begin month and end month for the “Frequent Flyer” months, are separated by a hyphen.To change a String to an integer to assign to the three miles instance variables and the beginning and ending month instance variables, use wrapper class Integer’s method parseInt(). After the whole file is read, the ArrayList can be converted to a normal, fixed-length array of objects: Destination[] destinationArray = (Destination[]) destinationList.toArray(new Destination[destinationList.size()]); By doing this, we can take advantage of Java’s built-in methods for sorting an array. Reading the File So that you will not have to do any exception handling, add the following at the end of the method declaration that reads the input file (no semicolon at the end of it): throws IOExceptionIMPORTANT Please read the Java Coding and Documentation Guidelines document in Course Documents on Blackboard before submitting your final app for grading. Submitting the Assignment The assignment will be submitted on Blackboard. Please submit the three (or four) .java files on Blackboard. DO NOT ZIP THEM AND DO NOT SUBMIT THE INPUT FILE! (Exact output follows) CSCI 470/502 Assignment 4: Mile Redeemer Console App Page 5 of 6 Exact App Output Using the File destinations.txt Provided Please enter the name of the file: destinations.txt —————————————————————- WELCOME TO THE JAVA AIRLINES MILES REDEEMER APP —————————————————————- List of destination cities your client can travel to: Berlin Hong Kong Hyderabad New York Paris Sidney Tokyo Vienna Washington, D.C. —————————————————————- Please enter your client’s accumulated Frequent Flyer Miles: 49600 Please enter your client’s month of departure (1-12): 4 Your client’s Frequent Flyer Miles can be used to redeem the following tickets: * A trip to Hong Kong in Economy Class * A trip to Hyderabad in Economy Class * A trip to Washington, D.C. in Economy Class Your client’s remaining Frequent Flyer Miles: 350 —————————————————————- Do you want to continue (y/n)? y —————————————————————- Please enter your client’s accumulated Frequent Flyer Miles: 3421 Please enter your client’s month of departure (1-12): 3 *** Your client has not accumulated enough Frequent Flyer Miles *** Your client’s remaining Frequent Flyer Miles: 3421 —————————————————————- Do you want to continue (y/n)? y —————————————————————- Please enter your client’s accumulated Frequent Flyer Miles: 154351 Please enter your client’s month of departure (1-12):Your client’s Frequent Flyer Miles can be used to redeem the following tickets: * A trip to Sydney in Economy Class * A trip to Hong Kong in Economy Class * A trip to Hyderabad in Economy Class * A trip to Tokyo in First Class * A trip to New York in Economy Class Your client’s remaining Frequent Flyer Miles: 1 —————————————————————- Do you want to continue (y/n)? n ————————————————————————- THANK YOU FOR USING THE JAVA AIRLINES MILES REDEEMER APP ————————————————————————-
In this assignment, you will develop a Java console app to calculate tips. We will break the code for this assignment into two separate classes.This class encapsulates the tip calculation logic. It may be reused with a different user interface in a future assignment. Data Members The class should have the following private data members: • Bill amount (a double with the default value 0) • Tip percentage (an integer with the default value 20) • Party size (an integer with the default value 1) Methods The class should have the following methods: • A default constructor (optional) • Public getters and setters for the three data members. • getTotalBill() – computes and returns the total bill (bill amount plus tip). • getIndividualShare() – computes and returns the value of an equal share of the total bill (i.e., the total bill divided by the party size).Class TipApp This class encapsulates the user interface of the app. Data Members • A TipCalculator object. (This could be a local variable in the calculateTips() method if you prefer.) Methods • main() – creates an instance of TipApp (an object of itself!) and uses it to call calculateTips(). • calculateTips() – This method will contain the logic for interacting with the user at the keyboard and displaying the output of the app. Feel free to extract parts of this logic into other methods that are called by calculateTips(). 1. Create a Scanner object to read input from the keyboard.2. Prompt for and read the bill amount. If an invalid numeric value is entered by the user, print an error message and repeat the process until a valid value is entered. When a valid value is entered, use it to set the bill amount data member for the TipCalculator object. 3. Prompt for and read the tip percentage. If an invalid numeric value is entered by the user, print an error message and repeat the process until a valid value is entered. When a valid value is entered, use it to set the tip percentage data member for the TipCalculator object. 4. Prompt for and read the party size. If an invalid numeric value is entered by the user, print an error message and repeat the process until a valid value is entered. When a valid value is entered, use it to set the party size data member for the TipCalculator object.5. Call the various TipCalculator methods to produce the output. 6. Ask the user whether they want to continue (enter another bill amount, tip percentage and party size) and read their response. If the response is y or Y, go back to Step 2 and repeat.Notes • To make grading by the TA easier, remove all package statements in the .java files that you submit. • Invalid input should include values that are not numeric (which will result in a NumberFormatException when Java attempts to convert them to a number) as well as numeric values that should not be possible.For example, it would be impossible to have a negative bill amount or a party size of 0. However, a tip percentage of 0 should certainly be possible. • Both upper- and lower-case letters should be allowed in the user’s response to the “Another bill? (y/n): ” prompt.Submit your two .java files on Blackboard. Do NOT zip your files for submission yet. Sample Output A sample run of the app might look something like this: *** Tip Calculator *** Enter the bill amount: 105.37 Enter your desired tip percentage (20 equals 20%): 2a Please enter a valid tip percentage. Enter your desired tip percentage (20 equals 20%): 20 Enter the size of your party: -3 Please enter a valid party size. Enter the size of your party: 3 Sample output continues on the next page!*** Your Bill *** Bill Amount: $105.37 Tip Percentage: 20% Party Size: 3 Total Bill (with Tip): $126.44 Share for Each Individual: $42.15 Another bill? (y/n): y Enter the bill amount: 78.27 Enter your desired tip percentage (20 equals 20%): 23 Enter the size of your party: 2 *** Your Bill *** Bill Amount: $78.27 Tip Percentage: 23% Party Size: 2 Total Bill (with Tip): $96.27 Share for Each Individual: $48.14 Another bill? (y/n): N Goodbye!
Create a class named Invoice that a hardware store might use to represent an invoice for an item sold at the store. An Invoice should include four pieces of information as instance variables – a part number (type String), a part description (type String), a quantity of the item being purchased (type int) and a price per item (double).The class you create should has two constructors. One that initializes the four instance variables and another that takes no arguments. Provide a set and get method for each instance variable. In addition, provide a method named getInvoiceAmount that calculates the invoice amount (i.e., multiplies the quantity by the price per item), then returns the amount as a double value. If the quantity is not positive, it should be set to 0. If the price per item is not positive, it should be set to 0.0.Write a test app name InvoiceTest that demonstrates class Invoice’s capabilities. InvoiceTest should declare five different invoice instances with different hardware store items. For example, hammer, phillips head screwdriver, light switch, cordless drill and carpenter’s square.You are going to be hardcoding the values for the instance variables for each of the five objects. It is not very realistic but it is okay to do this at this point. Then, use the get methods to print each invoice, one after the other, as follows: Invoice #1 Part No.: AB-23-4312 Item Desc.: Cordless Drill Quantity: 10 Item Price: 189.00 Invoice Subtotal: $1,890.00********************************* Invoice #2 and so on.Note that, in the above, the item price for the cordless drill would have been set to 189.00. Also, note that the DecimalFormat class should be used to place a floating dollar sign and commas in the output subtotal. The code and technique to do this can be found beginning on p. 8 of lecture notes 2. The maximum subtotal will be $99,999,999.99. This amount will help you determine your edit pattern for DecimalFormat’s method format.Be sure to add the required doc box to the top of both classes and submit both .java files on Blackboard before the due date and time.