MATH70071 Applied Statistics end-of-module assignment Submission deadline: 12:00 (noon) on Friday, 13/12/2024 Preparing your assignment 1. Use the Rmarkdown template file in the Software folder of the MSc in Statistics 2024-25 Blackboard page to write your report. Your R code should be provided in the appendix; this should be produced automatically by the template provided. Ensure your submitted file has tidy and well documented code chunks. 2. The report should be properly structured, and should be written using complete sentences. Marks are given both for the content of the report (correctness of code, numerical answers, etc.) and the quality of the presentation (clarity of plots, explanations, etc.). Two or three sentences is su cient for the verbal/explanatory parts of questions; longer answers are likely to be less clear. 3. At the beginning of your report you must include this statement of originality: “I, CID [YOUR CID], certify that this assessed coursework is my own work, unless other- wise acknowledged, and includes no plagiarism. I have not discussed my coursework with anyone else except when seeking clarification with the module lecturer via email or on MS Teams. I have not shared any code underlying my coursework with anyone else prior to submission.” Submitting your assignment 1. Before the above deadline submit a single PDF report via Blackboard (with, as above, your R code included as an appendix). 2. The filename should be MScStatistics AppliedStatistics [YOUR CID].pdf so, e.g., MScStatistics AppliedStatistics 00123456.pdf . Sociologists in Australia surveyed the public to assess the relationship between the perceived respect of diferent jobs and some objectively measurable attributes of these jobs. The results of the survey are in the table jobs .csv which has these columns: i job j class c bc,:average annual salary of peo education e frac men f respect r esct of the job
CS210 Fall 2024: PS5A Virtual Memory 1. (12 points) Let us have a system with 12-bit virtual addresses, 11-bit physical addresses, and a page of size 512 bytes. (a) (2 points) How many bytes of data can virtual memory support? (b) (2 points) How many bytes of data can physical memory support? (c) (2 points) How many bits is the offset address? (d) (2 points) How many virtual pages are there? (e) (2 points) How many physical pages are there? (f) (2 points) How many entries will the page table have? 2. (8 points) Using the system with the specifications described in Question 1 and the following table Translate the virtual addresses into their corresponding physical addresses (in hex). Write PAGE FAULT, if needed. (a) (2 points) Virtual Address: 0xC00 (b) (2 points) Virtual Address: 0x7A4 (c) (2 points) Virtual Address: 0x49C (d) (2 points) Virtual Address: 0xA00 Using the same system from question 1, we are going to execute 3 instructions sequentially. The original page table is given below. Update the page table to have the correct information after each instruction executes (questions 3, 4, 5). Keep the following in mind: Is a page fault needed?; Which page doI evict? (Note: in the LRU column, 1 represents the most recently used page). Make sure to update the LRU column on every access; Do I need to write to the disk?] 3. (8 points) mov [0x0f4], rax (a) (2 points) Page Fault needed? (b) (2 points) If so, which virtual page was evicted? (c) (2 points) Which physical page was overwritten by the page fault? (d) (2 points) Was a write to the disk needed? 4. (8 points) mov [0x9a4], rax (a) (2 points) Page Fault needed? (b) (2 points) If so, which virtual page was evicted? (c) (2 points) Which physical page was overwritten by the page fault? (d) (2 points) Was a write to the disk needed? 5. (8 points) mov [0xc44], rax (a) (2 points) Page Fault needed? (b) (2 points) If so, which virtual page was evicted? (c) (2 points) Which physical page was overwritten by the page fault? (d) (2 points) Was a write to the disk needed? Caching 6. We have the following 2-way set-associative cache containing 8 sets,with a block size of 2 64-bit words. The cache uses a LRU replacement strategy. At a particular point during execution, a snapshot is taken of the cache contents, which are shown below. All values are in hex; assume that any hex digits not shown are 0. The cache uses bits from the 64-bit byte address produced by the CPU to select appropriate set (L), and input to the tag comparisons (T) and to select the appropriate word from the data block (B). For correct and optimal performance what are the appropriate portions of the address to use forL, T, and B? Express your answer in the form. “A[N:M]” for N and M in the range 0 to 63, or write “CAN’T TELL” . (a) (1 point) Address bits to use for B: (b) (1 point) Address bits to use for L: (c) (1 point) Address bits to use for T: For the following addresses, if the contents of the specified location appear in the cache, give the location’s 64-bit contents in hex (determined by using the appropriate value from the cache). If the contents of the specified location are NOT in the cache, write “MISS” . (a) (1 point) Contents of the location 0x128 (in hex): (b) (1 point) Contents of the location 0xDB0 (in hex): (c) (1 point) Contents of the location 0x3BF70 (in hex): Pipelining 7. (10 points) Consider the following combinational logic circuit constructed from 6 modules. In the diagram below, each combinational component is marked with its propagation delay in seconds; con- tamination delays are zero for each component. (a) (2 points) What is the latency of this combinational circuit? (b) (2 points) What is the throughput of this combinational circuit? (c) (2 points) Draw the smallest number of ideal (zero delay, zero setup/hold time) pipeline registers on the circuit diagram below so as to maximize its throughput. Remember to place a register on the output. (d) (2 points) What is the latency of the pipelined circuit? (e) (2 points) What is the throughput of the pipelined circuit?
Machine Learning: Fall Exam IEOR E4525 Spring 2020, May 8th, 2020 1. Classifier Boundaries [20 pts] Given the following classifiers: 1 A logistic regression classifier. 2 A logistic regression classifier trained on the input features x1 , x2 , x1(2), x2(2). 3 A nearest neighbors classifier. 4 A SVM with linear kernel. 5 A SVM with a polynomial kernel of degree d = 2. K(x, x') = (1 + xx')2 (1) 6 A SVM with a Gaussian kernel K(x, x0 ) = e −γ|x−x'|2 (2) with and intermediate (neither too large, not too small) value of the √ parameter. 7 A linear SVM classifier trained with the input features x1 , x2 , x1(2), x2(2). 8 A SVM with a Gaussian kernel and a very small value of √ . 9 A LDA classifier 10 A QDA classifier 11 A Classification Tree. 12 A Gaussian Naive Bayes classifier. Indicate with classification boundary in Figure (1) was most likely gener-ated by each one of the classifiers above. Note that more that one of the classifiers might have generated the same boundary. If not indicated otherwise assume the classifiers were trained on the raw input features x = (x1, x2) ∈ R2. Figure 1: Classification boundaries for Problem 1 2. Polynomial Kernel [20 points] Given by x = (xA ; xB ) ∈ R2 and x' = (xA('); xB(')) ∈ R2 describe the feature mapping Φ( ) that generates the kernel: K(x; x') = (1 + √xT x')3 = (1 + √ (xA xA(') + xB xB(')))3 (3) 3. Dense Neural Network [30 points] We have a dense neural network with two hidden layers defined by the following graph: Figure 2: Network Architecture for Problem 5 where • The two hidden layers have ReLU activation. • the connection between the input an the first hidden layer is given by and • The connection between the first and second hidden layers is given by and • the connection between the second hidden layer and the output unit η is given by W2 = (1 2 -1) (8) and b2 = (-3) (9) • the last layer has logistic activation so that • The network performance is assessed using the logistic loss l(y, η) = log (1 + eη ) - yη (11) Given one input sample (x1 , x2 , x3 , x4 ) = (0, -2, 1, 1) and the network’s output label y = 1 compute: (a) The first hidden layer activations (a1(1), a2(1)). (b) The second hidden layer activations (a1(2), a2(2), a3(2)). (c) The output ˆ(y)(x) With that we have (d) Back propagate the errors δα through the network layers (e) Compute the gradients ∂W2/∂ L and ∂b2/∂L for the learnable parameters W2 , b2 of the output layer. (f) Compute the gradient to the learnable parameters W1 , b1 of the second hidden layer (g) Compute the gradient to the learnable parameters W0 , b0 of the first hidden layer 4. Tweedie Distribution: Generalized Linear Model [30 points] The Tweedie distribution is frequently use to model the distribution of in- surance losses. This distribution belongs to an exponential family defined by the probability density function where the parameter μ > 0 is unknown, the parameter 1 < p < 2 is assumed to be known, and the function h(y; p) is the base measure that does not depend on μ . (a) Write an expression for the sufficient statistic T (y) of the tweedie distribution: (b) Write an expression for the canonical parameter η as a function of μ and p. (c) Find the expression for the cumulant function A(η) (d) Derive an expression for the expected valueˆ(y) of the random variable y in terms of the parameters p and μ: (e) Derive an expression for the variance of y in terms of the distribution parameters p and μ (f) Assume we have training observations {xi , yi } for i = 1, . . . , N where we wish to assume that • yi > 0 is an observed, positive valued labels. • xi ∈ RD is our input data. • p(yi jxi ) has a tweedie distribution with fixed, known parameter p but unknown parameter μ that depends on x. • The canonical parameter ηi is linear in xi η(xi ) = xi(T)w + b (14) Write the loss function L(w, b; {xi , yi } of the generalized tweedie re- gression model (g) Write the gradient of the loss L with respect to w and b. (h) What does the constrain μ > 0 implies for the allowed values of the canonical parameter? How would this afect the optimization problem of learning the parameters w ,b?. Explain briefiy what you would do to resolve this issue.
IEOR E4525 Machine Learning for OR and FE Due: November 19 2020 Final Exam 1. Feedforward Networks (25 points) 1.1. (4 pts) Suppose I have a neural network with a single hidden layer with weights W1 ∈ Rk×d, no bias terms, and ReLU activation functions (the input is xi ∈ Rd ). Now suppose I add a second hidden layer after the first one, with no bias terms. Let’s say that W(1)I, W(2)I are the weight matrices in this new network. How can I choose W(1)I, W(2)I such that the network represents the same function as the one-hidden-layer network? 1.2. (4 pts) How many parameters are there in a fully connected feed forward network with l hidden layers, width k , d input features, and we use a linear model to combine the output at the last hidden layer into a single prediction? 1.3. (9 pts) In class we saw that a neural network can learn the XOR function. Prove that a feedforward neural network with a single layer can learn any Boolean function. A Boolean function is a function f : {0, 1}n → {0, 1}. Given such a function f, construct a neural network with a single layer that correctly outputs f. 1.4. Consider a feedforward neural network with linear activation functions: σ(z) = a · z + b, for a, b ∈ R, with a ≠ 0. 1.4.1. (4 pts) Consider a network with a single hidden layer with weight matrix W ∈ Rk×d and ofsets b ∈ Rk . Derive an expression that shows that the output of the neural network is linear in the input x ∈ Rd. This expression should not include the intermediate variables h or z in the hidden layer. 1.4.2. (4 pts) Suppose that the width k of the single hidden layer in the network is much smaller thand, the number of features. Now consider some linear regression βTx+β0 on the original features x. Can this linear regression be expressed using this neural network? If yes, how? If no, why not? 2. SGD (10 points) 2.1. Consider minibatch SGD with a batch size of m. In minibatch SGD we normally sample without replacement. Suppose we run minibatch SGD with replacement. Derive the mean and variance of this estimator. 3. Support Vector Machines (20 points) 3.1. (10 pts) In class we saw that a deep net can implement the XOR function. But so can SVM! Give an SVM that computes the XOR function. For this exercise, you should assume that x ∈ {-1, 1} and the output is in {-1, 1}. Written this way, the XOR dataset is ([-1, -1], -1), ([1, -1], 1), ([-1, 1], 1), ([1, 1], -1) 3.2. (10 pts) In class we saw that the SVM problem for the separable case can be written as min β0 ,βⅡβⅡ2(2) s.t. yi(β0 + βT xi) ≥ 1, ∀i = 1, . . . , n In the soft-margin SVM problem, we instead solve the following problem: minβ0,βλ1 2k βk 2 2 +nXi=1max(0, 1 − yi(β0 + β T xi)) Either prove or give a complete counterexample for the following statement: There exists a single value λ such that for every set of n data points x1 , . . . , xn that are separable, hard SVM and soft SVM return the same solution β, β0 4. PCA and clustering (25 points) Suppose that we have a clustering problem with each data point xi ∈ Rd. The K-means optimization problem is: C1,...,CK 1 |Ck| X i,i0 ∈Ck k xi − xi 0 k 2 (1) Suppose we perform PCA to get k < d principal components. Let zi ∈ Rk be the represen- tation of xi in terms of the k principal components. We will compare clustering on xi and zi. 4.1. (4 pts) We use the K-means clustering algorithm covered in class, on the original data points xi. Give an example showing that the K-means algorithm may converge to a local minimum which is not a global minimum (hint: give a one-dimensional example). 4.2. (4 pts) We use the K-means clustering algorithm covered in class, on the PCA represen- tation zi, with K > k. Does the resulting clustering represent a local minimum of the K-means clustering optimization problem given in (1)? Here, you may take local mini- mum to mean that the K-means algorithm would not make any changes to the clustering if allowed to run starting from the computed clustering, but using the xi. 4.3. (12 pts) If your answer to the previous question was yes, argue why. If your answer was no, give a counterexample. 4.4. (5 pts) Suppose that the data matrix X ∈ Rn×d , where each xi is a row, is rank r = k. Does this change your answer to question 4.2.? Why/why not? 5. Matrix Completion (20 points) 5.1. (10 pts) Every matrix M ∈ Rn×m of rank exactly r can be factorized into matrices B ∈ Rn×r , Y ∈ Rr×m such that M = BY. Under the assumption that B must be orthonormal, characterize the set of solutions B and Y to the optimization problem. Hint: the solution is not unique. 5.2. (10 pts) Consider the alternating minimization algorithm for the matrix completion problem. At iteration t, we saw in class that the update for Y is as follows Y t = arg min Y X (i,j) (xij − yi > zj t−1 ) 2 + λk Y k 2 F . Derive an exact expression for Yt.
N1569 BSc EXAMINATION Financial Risk Management There are FIVE questions and each has FOUR parts. Each part carries 5 marks. Each question should take you 24 minutes, so 6 minutes per part. 1. (a) You want to measure the market risk of a portfolio containing hundreds of cash flows. How would you select the risk factors, and how would you map the cash flows to these risk factors? (b) Given the 15-month interest rate is 3.5% per annum and it has a volatility of 60 basis points (bps), find the present value (PV) and the present value of a basis point (PV01) of a cash flow of $3m 15 months from now. Justify your answers. (c) Use the appropriate Excel workbook to map this cash flow to vertices at 1 and 2 years, in such a way that both PV and volatility are preserved under the mapping. You are given that the 1-year rate is 4% and has volatility 65 bps, the 2-year rate is 3% and has volatility 50 bps, and the correlation between the 1-year and 2-year rates is 0.9. Justify your answer. (d) Use the appropriate Excel workbook to calculate the present value of a basis point (PV01) of the mapped cash flows in part (c) and comment on your results. 2. (a) Which model would you use to measure Value-at-Risk (VaR) for an equity portfolio and why? (b) What issues would you expect to arise from this choice, if any? (c) Calculate the 1% daily historical VaR of the S&P500 index using daily returns between 1 January 2010 and 31 December 2023. How does this compare with the normal VaR? Give your answers as a % of the portfolio value. (d) Scale this 1% daily VaR to a 10-day VaR under the assumption that the daily returns on the S&P 500 are independent and identically distributed. Would the scaled VaR remain unchanged if you were to assume the daily returns were positively autocorrelated? Justify your answer. 3. (a) What is Value at Risk (VaR)? Describe its parameters. (b) Describe the purpose of the Excel spreadsheet “Rolling Normal VaR” (c) Change the spreadsheet so that the 30-day standard deviation is replaced by a 10-day standard deviation, leaving the VaR parameters unchanged. Describe the effect on the graph and explain why we observe this effect. (d) Describe the purpose of the “VaR Model Comparison" spreadsheet and discuss the results therein.
Module Code: BHO0171 Module Title: E-commerce Assessment Type Individual Report Academic Year 2024/25 Term 1 Assessment Task For this assignment you need to develop a brief proposal for an E-commerce start-up business. Accordingly, the aim of your proposal is to attract the attention of potential angel investors. Level of AI-Use permitted for this Assessment Level 1 – Not Permitted Level 1- Not Permitted. The use of AI tools is not permitted in any part of this assessment. Level 2 – Some use Permitted. Some use of AI tools is permitted in the research/early stages of this assignment but you must ensure that the work you submit is your own. If you use AI tools, you should acknowledge or reference this in your work. Use the Text reference builder to learn how to reference AI generated ideas. The sorts of questions to consider when using AI are: Is it accurate? Are the references genuine? · Has it reproduced bias? Level 3 – Integrated. The use of AI tools is integrated in this assessment. Further guidance is included in this assessment brief. Duration N/A Word Count 1500 Task specific guidance: In this report you must have the following parts: Executive summary (approximately 200 words) This is a brief summary of your whole business plan. The executive summary can help your potential angel investors to learn about your idea without reading the full proposal. It is not an “introduction”. It summarises the report and gives a quick overview of important information about your business plan to the readers. You need to insert the URL of your website here. 1. Introduction/Description of the business (approximately 200 words) In this section you are expected to provide motivation and rationale for selecting a particular business domain. The aim of your proposal is to attract the attention of potential angel investors, who will be interested in the specific domain of business and your rationale for selecting it. You need to select one of the following business domains: · book publishing and retailing, · music publishing and retailing, · tourism and travel, · clothing retailing, · or any other idea for business you may have. 2. Describe and justify your E-commerce business model (approximately 800 words) In this section you need to provide details and justifications for your model. You also need to specify the key components of your business model in this section including value proposition, revenue model, market opportunity, competitive environment, competitive advantage, etc. This section is covered in teaching weeks 4/8/11. Teaching week 4: Payment systems and E-commerce security. Teaching week 8: E-commerce business strategy (business models). Teaching week 11: Ethics and E-commerce. 3. Marketing strategy (approximately 300 words) In this section you need to specify the market strategy for your business following the analysis for your business model in Section 2. This section is covered in teaching week 8/10. Teaching week 8: E-commerce business strategy (business models). Teaching week 10: Consumers online and E-commerce marketing. 4. Conclusion (approximately 200 words) Summarise the attractive points in your proposal (why should an investor choose your business to invest?). 5. References (not included in word count). Other requirements: Table of contents You should create a table of contents via MS Word’s Automatic Table of Contents function, which can be found at the top left corner after you clicked References button at the top centre of MS Word. To use this function, you need to set the styles of your section titles and contents when writing. You can find available styles: Normal, Heading 1, Heading 2, etc. at the top right of MS Word window. Table of Contents should be located after the Executive Summary and before the Introduction. Develop your website and provide the Link to the website Construct a functioning e-commerce website for your business by using development tools like WordPress, Wix, Weebly, etc. Fulfil the requirements of the website as laid out in the tutorials: – Need to have at least 5 pages including Home page. – Use a menu on the Home page to reach the other pages. – Home page should present the logo and a picture (banner) created by yourself. – Products or a similar page should present your products/services. – A table is required to display the services and prices. – Contact us page should contain a contact form. and an interactive Google map block. This section is covered starting from week 7/8 (see WordPress tutorials). Teaching week 7: E-commerce Infrastructure and Online Presence. Teaching week 8 tutorial: Wordpress. Required Reading: Laudon, K. C., & Traver, C. G. (2023). E-commerce 2023-2024: Business, Technology, Society (Global Edition). Pearson Education Limited. General study guidance: · Cite all information used in your work which is clearly from a source. Try to ensure that all sources in your reference list are seen as citations in your work, and all names cited in the work appear in your reference list. · Reference and cite your work in accordance with the APA 7th system – the University’s chosen referencing style. For specific advice, you can talk to your Business librarians or go to the library help desk, or you can access library guidance via the following link: o APA 7th referencing: https://library.hud.ac.uk/pages/apareferencing/ · The University has regulations relating to academic misconduct, including plagiarism. The Learning Innovation and Development Centre can advise and help you with how to avoid ‘poor scholarship’ and potential academic misconduct. You can contact them at [email protected]. · If you have any concerns about your writing, referencing, research or presentation skills, you are welcome to consult the Learning Innovation Development Centre team [email protected]. It is possible to arrange 1:1 consultation with a LIDC tutor once you have planned or written a section of your work, so that they can advise you on areas to develop. Do not exceed the word limit / time / other limit.
COMP24011 Lab 4: BM25 for Retrieval-Augmented Question Answering Academic session: 2024-25 Introduction In this exercise, you will develop your own implementation of the BM25 scoring algorithm, one of the most popular methods for information retrieval. Apart from the traditional uses of information retrieval methods in the context of search engines and document ranking, they have recently been employed to enhance the question answering (QA) capabilities of generative large language models (LLMs). Such models such as ChatGPT, can answer questions based on knowledge learned during their training on large amounts of textual data. However, they suffer from well-known limitations, including their tendency to hallucinate (i.e., make up answers that are factually wrong), as well as biases that they learned from the training data. A workaround to these issues is the integration of an information retrieval module into the question answering pipeline, in order to enable the LLM to access factual information stored in relevant documents that can be used by the model in producing its output. If you follow this manual all the way to the end, you will have the opportunity to observe how BM25 enables an LLM to provide more accurate answers to questions. Your main task for this exercise, however, is to implement pre-processing techniques, compute the BM25 score of each (pre-processed) document in relation to a (pre-processed) question, and return the topmost relevant documents based on the scores. For this exercise, you are provided with the following text files as resources: transport_inventions.txt The content of this file was drawn from Wikipedia’s timeline of transportation technology. We will consider this file as a corpus, i.e., a collection of documents, whereby each line corresponds to one document. Given that there are 10 lines in the file, this corpus consists of 10 documents. music_inventions.txt The content of this file was drawn from Wikipedia’s timeline of music technology. We will consider this file as another corpus. As in the first corpus, each line corresponds to one document. Given that there are 10 lines in the file, this corpus consists of 10 documents. stopwords_en.txt This file contains a stop word list taken from the Natural Language Tooklkit (NLTK). This is a list of words that are commonly used in the English language and yet do not bear meaning on their own. Every line in the file is a stop word. If you make changes to the contents of these files, this will change the expected behaviour of the lab code that you’re developing, and you won’t be able compare its results to the examples in this manual. But you can always use git to revert these resources to their original state To complete this lab you will need a third-party stemming tool called PyStemmer. You can install it by issuing the following command $ pip install pystemmer The BM25 Retrieval System Once you refresh the lab4 branch of your GitLab repo you will find the following Python files. run_bm25.py This is the command-line tool that runs each separate NLP task according to the subcommand (and the parameters) provided by the user. It contains the RunNLP class. nlp_tasks_base.py This module contains the NLPTasksBase “abstract” class that specifies the signatures of four methods you need to implement, and implements the interface used in RunNLP. nlp_tasks.py This is the module that you need to complete for this exercise. It contains the NLPTasks class that is derived from NLPTasksBase, and must imple-ment its abstract methods in order to complete the BM25-based retrieval of documents relevant to a given question. In order to successfully complete this lab you will need to understand both nlp_tasks_base.py and nlp_tasks.py but you do not need to know the details of how run_bm25.py is coded. Once you complete this exercise, the BM25 tool will be able to obtain the documents most relevant to a given question. This BM25 retrieval system provides comprehensive help messages. To get started run the command $ ./run_bm25.py -h usage: run_bm25.py [-h] -c CORPUS [-w STOPWORDS] [-s] {preprocess_question,preprocess_corpus,IDF,BM25_score,top_matches} ... options: -h, --help show this help message and exit -c CORPUS, --corpus CORPUS path to corpus text file (option required except for the preprocess_question command) -w STOPWORDS, --stopwords STOPWORDS path to stopwords text file (option required unless stopwords are located at ./stopwords_en.txt) -s, --stemming enable stemming subcommands: select which NLP command to run {preprocess_question,preprocess_corpus,IDF,BM25_score,top_matches} preprocess_question get preprocessed question preprocess_corpus get preprocessed corpus IDF calculate IDF for term in corpus BM25_score calculate BM25 score for question in corpus document top_matches find top scoring documents in corpus for question Notice that for most subcommands you need to specify which corpus to work with, as you’ll have the 2 choices described in the Introduction: transport_inventions.txt or music_inventions.txt. On the other hand, unless you move the stopwords list to another directory, you should not need to give its location. The tool has a boolean flag that controls if stemming should be applied when pre-processing text. By default it is set to False, but you can set it to True using the stemming option. This will affect the way your text preprocessing code for Task 1 below should work. The BM25 tool supports five subcommands: preprocess_question, preprocess_corpus, IDF, BM25_score and top_matches. The first two will call your text pre-processing implementation, the others will call the corresponding functions that you’ll develop in Tasks 2 to 4 below. Each of these subcommands has its own help message which you can access with commands like $ ./run_bm25.py top_matches -h usage: run_bm25.py top_matches [-h] question n positional arguments: question question string n number of documents to find options: -h, --help show this help message and exit The BM25 tool will load the stopwords list and corpus as required for the task. For example, running the command $ ./run_bm25.py preprocess_question "Who flew the first motor-driven airplane?" nlp params: (None, ’./stopwords_en.txt’, False) debug run: preprocess_question(’Who flew the first motor-driven airplane?’,) ret value: flew first motor driven airplane ret count: 32 will not load the corpus as text pre-processing is only applied to the given question string. Note that text pre-processing should, in general, return a different value if stemming is enabled. In fact, for the same question of the previous example you can expect $ ./run_bm25.py -s preprocess_question "Who flew the first motor-driven airplane?" nlp params: (None, ’./stopwords_en.txt’, True) debug run: preprocess_question(’Who flew the first motor-driven airplane?’,) ret value: flew first motor driven airplan ret count: 31 To pre-process the text of a whole corpus you should use the preprocess_corpus subcommand. For example, once you’ve finished Task 1 you should get $ ./run_bm25.py -s -c music_inventions.txt preprocess_corpus nlp params: (’music_inventions.txt’, ’./stopwords_en.txt’, True) debug run: preprocess_corpus() ret value: [ ’1940 karl wagner earli develop voic synthes precursor vocod’, ’1941 commerci fm broadcast begin us’, ’1948 bell laboratori reveal first transistor’, ’1958 first commerci stereo disk record produc audio fidel’, ’1959 wurlitz manufactur sideman first commerci electro mechan drum machin’, ’1963 phillip introduc compact cassett tape format’, ’1968 king tubbi pioneer dub music earli form. popular electron music’, ’1982 soni philip introduc compact disc’, ’1983 introduct midi unveil roland ikutaro kakehashi sequenti circuit dave smith’, ’1986 first digit consol appear’] ret count: 10 Assignment For this lab exercise, the only Python file that you need to modify is nlp_tasks.py. You will develop your own version of this script, henceforth referred to as “your solution” in this document. Before you get started with developing this script, it might be useful for you to familiarise yourself with how the NLPTasksBase “abstract” class will initialise your NLPTasks objects: • The documents in the specified corpus are loaded onto a list of strings; this list becomes the value of the field self.original_corpus • The stop words in the specified stop word list file are loaded onto a list of strings, which becomes the value of the field self.stopwords_list • If stemming is enabled, an instance of the third-party Stemmer class is created and assigned to the field self.stemmer In addition, the pre-processing of the corpus and of the question strings is done automatically in the NLPTasksBase abstract class. The pre-processed text for these become available as the fields self.preprocessed_corpus and self.preprocessed_question, respectively. Task 1: In your solution, write a function called preprocess that takes as input a list of strings and applies a number of pre-processing techniques on each of the strings. The function should return a list of already pre-processed strings. Pre-processing involves the following steps, in the order given: 1. removal of any trailing whitespace 2. lowercasing of all characters 3. removal of all punctuation 4. removal of any stop words in the list contained in the specified stop word list 5. stemming of all remaining words in the string if stemming is enabled. In relation punctuation removal, it is important to note the following: • For a standard definition of what counts as a punctuation, you can use the values returned by the string.punctuation constant in Python. • Avoid merging any tokens unnecessarily. For instance, in the examples shown in the previous page and below, the removal of the hyphen in “motor-driven” and the single quote in “world’s” was done in such a way that the separation of corresponding tokens was preserved, leading to e.g., ‘motor’ ‘driven’ (instead of ‘motordriven’) and ‘world’ ‘s’ (instead of ‘worlds’). In the case of ‘world’ ‘s’, note that ‘s’ will be subsequently discarded by stop word removal. As for applying the third-party stemming tool, please refer to PyStemmer’s documentation, to find how one can call the stemWords function of a Stemmer object. You can verify that your function behaves correctly on the command line. In addition to the examples in the previous section, note that in some cases stemming will not change the pre-processed result. For example, you should obtain the following output: $ ./run_bm25.py preprocess_question "When did the world’s first underground railway open?" nlp params: (None, ’./stopwords_en.txt’, False) debug run: preprocess_question("When did the world’s first underground railway open?",) ret value: world first underground railway open ret count: 36 $ ./run_bm25.py -s preprocess_question "When did the world’s first underground railway open?" nlp params: (None, ’./stopwords_en.txt’, True) debug run: preprocess_question("When did the world’s first underground railway open?",) ret value: world first underground railway open ret count: 36 Task 2: In your solution, write a function called calc_IDF that calculates the inverse document frequency (IDF) of a given term (i.e., a token or word) in a pre-processed corpus. The score should be returned as as a float. Since IDF is calculated based on a pre-processed corpus, this function will always be called after the preprocess function (Task 1) has been applied to the corpus. As ex-plained, the result of this can be accessed as the field self.preprocessed_corpus. You can verify that your function behaves correctly on the command line. For example, you should obtain the following output: $ ./run_bm25.py -s -c transport_inventions.txt IDF airplan nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, True) debug run: IDF(’airplan’,) ret value: 0.8016323462331664 $ ./run_bm25.py -c transport_inventions.txt IDF first nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, False) debug run: IDF(’first’,) ret value: -0.531478917042255 Task 3: In your solution, write a function called calc_BM25_score that calculates the BM25 score for a pre-processed question (a string) and a pre-processed document that is specified by its index in the corpus (an integer, starting from zero). The score should be returned as a float. As explained above, the pre-processed question and corpus can be accessed as the fields self.preprocessed_question and self.preprocessed_corpus, respectively. You can verify that your function behaves correctly on the command line. For example, you should obtain the following output: $ ./run_bm25.py -s -c transport_inventions.txt BM25_score "flew first motor driven airplan" 4 nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, True) debug run: BM25_score(’flew first motor driven airplan’, 4) ret value: 2.8959261945969574 $ ./run_bm25.py -s -c transport_inventions.txt BM25_score "flew first motor driven airplan" 6 nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, True) debug run: BM25_score(’flew first motor driven airplan’, 6) ret value: -0.6030241558748664 Task 4: In your solution, write a function called find_top_matches that calculates the BM25 score for a question (a string) and every document in the corpus. Both the question and the documents should have undergone pre-processing prior to the BM25 score cal-culation, taking into account whether stemming is enabled. The n top-scoring original documents should be returned in the form. of a list of strings. As above, pre-processed texts will be available in fields of your NLPTasks object. You can verify that your function behaves correctly on the command line. For example, you should obtain the following output: $ ./run_bm25.py -s -c transport_inventions.txt top_matches "Who flew the first motor-driven airplane?" 3 nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, True) debug run: top_matches(’Who flew the first motor-driven airplane?’, 3) ret value: [ ’1903: Orville Wright and Wilbur Wright flew the first motor-driven airplane. ’, ’1967: Automatic train operation introduced on London Underground. ’, ’2002: Segway PT self-balancing personal transport was launched by inventor Dean Kamen. ’] ret count: 3 $ ./run_bm25.py -s -c transport_inventions.txt top_matches "When did the world’s first underground railway open?" 3 nlp params: (’transport_inventions.txt’, ’./stopwords_en.txt’, True) debug run: top_matches("When did the world’s first underground railway open?", 3) ret value: [ "1863: London’s Metropolitan Railway opened to the public as the world’s first underground railway. ", ’1890: The City and South London Railway (C&SLR) was the first deep-level underground "tube" railway in the world, and the first major railway to use electric traction ’, ’1967: Automatic train operation introduced on London Underground. ’] ret count: 3 Extension: Question Answering Integration This part of the lab exercise will not be marked. However, you are strongly encouraged to also engage with this activity so that you can gain a full appreciation of how even a simple information retrieval module based on BM25 can help improve — dramatically — the answers produced by a generative large language model. Google Colab familiarisation. Due to the fact that generative large language models are difficult to run on local machines given their required computational resources, we will make use of Google Colab which requires a Google account. It is a cloud-based platform. for developing and running Python notebooks that gives you access to computational resources (such as bigger RAM and GPUs). Please explore Google Colab now if you have not done so before. Note that the model that we will use does not require you to subscribe to any of the Google Colab paid products; it will run even on a free Google Colab account. Obtaining a Huggingface access token. Huggingface is the biggest repository of LLMs that supports the loading of models directly from code. However, this requires an access token. To obtain one, please sign up for a Huggingface account. Once you have an account, you should be able to find your access token by clicking on your profile icon, then Settings and finally Access Tokens. You will need this token as you use the Retrieval-augmented QA notebook (described below). Retrieval-augmented QA notebook. Access our pre-prepared notebook. Create a copy of the notebook by clicking on the File menu and then the Save a copy in Drive option. Follow the cells in the notebook and observe the impact of the BM25 retrieval module on QA. Submission Please follow the README.md instructions in your COMP24011_2024 GitLab repo. Refresh the files of your lab4 branch and develop your solution to the lab exercise. The solution consists of a single file called nlp_tasks.py which must be committed to your GitLab repo and tagged as lab4_sol. The README.md instructions that accompany the lab files include the git commands necessary to commit, tag, and then push both the commit and the tag to your COMP24011_2024 GitLab repo. Further instructions on coursework submission using GitLab can be found in Appendix L of the CS Handbook, including how to change a git tag after pushing it. The deadline for submission is 09:00 on Monday 9th December. In addition, no work will be considered for assessment and/or feedback if submitted more than 1 week after the deadline. (Of course, these rules will be subject to any mitigating procedures that you have in place.) The lab exercise will be auto-marked offline. The automarker program will download your sub-mission from GitLab and test it against our reference implementation. For each task the return value of your function will be checked on a random set of valid arguments. A time limit of 10 seconds will be imposed on every function call, and exceeding this time limit will count as a runtime error. If your function does not return values of the correct type, this will also count as a runtime error. A total of 20 marks is available in this exercise, distributed as shown in the following table. Task Function Marks 1 NLPTasks.preprocess() 5 2 NLPTasks.calc_IDF() 5 3 NLPTasks.calc_BM25_score() 5 4 NLPTasks.find_top_matches() 5 The marking scheme for all tasks is as follows: • You obtain the first 0.5 marks if all tests complete without runtime errors. • The proportion of tests with fully correct return values determines the remaining 4.5 marks. During marking, your NLPTasks object will be initialised independently. This means that when functions that require text pre-processing get tested, your object will have all its fields initialised with correct values independent of your implementation of Task 1. In addition to the two corpora provided in your repo, your solution will be tested with a hidden corpus for marking. This will only be released together with the results and feedback for the lab. Important Clarifications • It will be very difficult for you to circumvent time limits during testing. If you try to do this, the most likely outcome is that the automarker will fail to receive return values from your implementation, which will have the same effect as not completing the call. In any case, an additional time limit of 300 seconds for all tests of each task will be enforced. • This lab exercise is fully auto-marked. If you submit code which the Python interpreter does not accept, you will score 0 marks. The Python setup of the automarker is the same as the one on the department’s Ubuntu image, but only a minimal set of Python modules are available. If you choose to add import statements to the sample code, it is your responsibility to ensure these are part of the default Python package available on the lab machines. • It doesn’t matter how you organise your lab4 branch, but you should avoid having multiple files with the same name. The automarker will sort your directories alphabetically (more specifically, in ASCII ascending order) and find submission files using breadth-first search. It will mark the first nlp_tasks.py file it finds and ignore all others. • Every file in your submission should only contain printable ASCII characters. If you include other Unicode characters, for example by copying and then pasting code from the PDF of the lab manuals, then the automarker is likely to reject your files.
Mathematics IA Topic 1: Developing a modelling to predict the crime rate based on some socioeconomic factors. Research question 1: What is the correlation between crime rates and socioeconomic factors such as unemployment and poverty? Socioeconomic factors: · Poverty -> GDP ranking: discuss how GDP rankings can serve as an indicator of poverty levels and its potential impact on crime rates. · Unemployment: analyze the relationship between unemployment rates and crime & consider how higher unemployment may correlate with increased crime rates. Personal connection: · Interested in this topic: my motivation for studying crime rates and socioeconomic factors. · Future aspirations: mention my plans to study crime-related courses at uni & how this research contributes to my academic pathway. Potential impact: 1, Public policy and resource allocation guide decision-making. · Policymakers can be based on this resources and target area with higher predicted crime rates for preventive measures. · Ex: Increase job training / increase educational support è Attracting investment -> to offer more job opportunities. the potential benefits of job training programs and educational support aimed at reducing poverty and unemployment. 2, academic and research contributions è Further research directions How my findings could encourage further studies into the relationship between socioeconomic factors and crime. DATA: Topic 10 countries with the highest crime rates in 2024 (https://worldpopulationreview.com/country-rankings/crime-rate-by-country) Country Crime index (numbers) Overall Criminality Score (GOCI) Criminal markets score Criminal actors score Resilience Score Venezuela 82.1 6.72 6.03 7.4 1.88 Papua New Guinea 80.4 5.72 5.33 6.1 3.29 Afghansitan 78.4 7.1 7 7.2 1.5 Haiti 78.3 5.93 5.77 6.1 2.46 South Africa 75.5 7.18 6.87 7.5 5.63 Honduras 74.3 7.05 6 8.1 4.08 Trinidad and Tobago 70.8 5.2 4.8 5.6 5.33 Syria 69.1 7.07 6.43 7.7 1.92 Guyana 68.8 5.97 5.13 6.8 4.04 Peru 67.5 6.4 6.2 6.6 4.38 Crime index: GOCI: The GOCI indicates countries' ability to deal with organised crime and their vulnerability to organised crime, and ranks each country based on these two factors. Topics and Methodology 1. Scatter plot (SL) 2. Regression line (SL) 3. Pearson’s product-moment correlation coefficient r -> (SL) 4. Non-linear regression (HL) 5. Hypothesis testing (Chi-square independent test to stat) (HL) 6. Looking at Statistics from a sample of the data set. (HL) 7. Passion distribution 8. 1. Scatter plot (SL): · Data collection: gather data on crime rates, unemployment rates, and poverty levels (potentially using GDP rankings as a proxy for poverty) · Visual representation: create a scatter plot to visualize the relationship between crime rates and each socioeconomic factor. · Analysis: identify the correlation and draw a regression line. Calculate means for relevant data point. Scatter plot (use GDC) to draw graph -> find correlation and regression line -> find the mean (if correlation doesn’t match -> non-linear correlation -> use residual plot (HL) 2. Regression line (SL): · Model development: fit a regression line to ur scatter plot data for both unemployment and poverty. · Interpretation: discuss the meaning of the slope and y-intercept in the context of crime prediction. 3. Pearson’s product-moment correlation coefficient r -> (SL) · Calculation: compute Pearson’s r for the relationships between crime rates and both unemployment and poverty. · Interpretation: analyze the correlation coefficients to determine the strength and direction of the relationships. 4. Non-linear regression (HL): · Exploration of non-linearity: 5. Chi-square independent test (dependent or independent) – hypothesis test - It is a test that measures how a model compares to actual observed data. · Unemployment & poverty -> crime rate Structure 1. Introduction Context: Briefly introduce crime rates and their significance. Research Question: Clearly state your research question: What is the correlation between crime rates and socioeconomic factors such as unemployment and poverty? Personal Connection: Discuss your interest and motivations for selecting this topic, along with future aspirations in crime-related studies. 2. Literature Review Socioeconomic Factors: Poverty: Discuss GDP rankings as an indicator of poverty and its implications for crime rates. Unemployment: Analyze literature on the relationship between unemployment rates and crime rates. 3. Methodology Data Collection: Describe how you will gather data on crime rates, unemployment rates, and poverty levels (e.g., using GDP). Statistical Tools: Outline the mathematical concepts and tools you will use, such as: Scatter plots Regression lines Pearson’s correlation coefficient Non-linear regression Chi-square independence testing 4. Analysis Scatter Plot (SL): Present and analyze scatter plots to visualize relationships. Discuss correlation and regression lines; calculate means of relevant data points. Regression Analysis (SL): Fit regression lines and interpret coefficients (slope and y-intercept) in the context of your research. Pearson’s Correlation Coefficient (SL): Calculate and interpret Pearson's r for both unemployment and poverty against crime rates. Non-linear Regression (HL): Explore non-linear relationships and analyze residuals. Chi-Square Test (HL): Conduct a hypothesis test to compare observed crime rates with expected rates based on socioeconomic factors. 5. Evaluation Interpretation of Results: Discuss findings from your analysis, including implications for understanding the relationship between crime and socioeconomic factors. Potential Impact: Public Policy: How your findings can guide policymakers in resource allocation and preventive measures. Academic Contributions: Suggest directions for further research based on your findings. 6. Conclusion Summary of Findings: Recap the main results and their significance. Reflection: Reflect on the research process and what you learned about the relationship between socioeconomic factors and crime. 7. References List all sources used for data collection and literature review. Additional Topics from the Syllabus Descriptive Statistics: Discuss measures of central tendency and dispersion for your data sets. Probability Distributions: Explore how different distributions (e.g., normal distribution) might apply to your data. Statistical Inference: Discuss confidence intervals or hypothesis testing in the context of your findings. Data Visualization: Consider using other forms of data visualization, such as histograms or box plots, to present your findings.
FIN 4453 – PROJECT 3 (10 Questions) Fall 2024 1. (10 Points) A bond has just been issued. The bond will mature in 8 years and has a yield to maturity of 6%. The bond’s annual coupon rate is 7% and the face value of the bond is $1,000. Coupons will be paid quarterly. a. Compute the bond’s duration using the basic duration formula, i.e., the Macaulay duration formula (DO NOT use Excel’s Duration function or the VBA function dduration). 2. (10 Points) A bond has just been issued. The bond is currently selling for $848. The bond will mature in 9 years. The bond’s annual coupon rate is 9% and the face value of the bond is $1,000. Coupons will be paid semi-annually. The bond is callable in 6 years and the call price is $1090. a. Compute the bond’s annual yield to call. b. Compute the bond’s current yield. c. Compute the bond’s annual yield to maturity. 3. (5 Points) A stock is selling today for $56. The stock has an annual volatility of 42 percent, and the annual nominal risk-free interest rate is 8 percent. A 14-month European call option with an exercise price of $48 is available to an investor. a. Use Excel’s data table feature to construct a Two-Way Data Table to demonstrate the impact of the exercise price and the option’s duration on the price of this call option: i. Option durations of 2 months, 4 months, 6 months, 8 months, and 10 months. ii. Exercise prices of $50, $52, $58, $60, and $62. b. How is the call option price impacted by varying the exercise price? c. How is the call option price impacted by varying the duration of the option? 4. (10 Points) A bond has just been issued. The bond has an annual coupon rate of 6% and coupons are paid annually. The bond has a face value of $1,000 and will mature in 8 years. The bond’s yield to maturity is 4%. a. Calculate the actual currency change in the bond’s price as the yield to maturity changes from 4% to 5%. b. Use the bond’s duration to calculate the bond’s approximate currency price change as the yield to maturity changes from 4% to 5%. 5. (5 Points) A bond has just been issued. The bond will mature in 14 years. The bond’s annual coupon rate is 16% and the face value of the bond is $1,000. The bond’s (annual) yield to maturity is 12%. a. Compute the bond’s duration if coupons are paid quarterly: i. Using the VBA dduration function. 6. (20 Points) The Excel file Portfolio Bond Immunization Data contains information about three bonds. Coupons are paid annually. Use this data to: a. Compute the amount to be invested to meet the future liability noted in the data. This future liability is due in 9 years. b. Find a combination of Bond 1 and Bond 2 having a target duration of 9 years. c. Find a combination of Bond 1 and Bond 3 having a target duration of 9 years. d. Perform. an analysis using a data table and an accompanying graph to determine which of the following options (i.e., a portfolio consisting of Bond 1 and Bond 2, a portfolio consisting of Bond 1 and Bond 3, or a portfolio consisting of Bond 2) would be preferred to attempt to immunize this obligation. i. Construct a data table by varying the yield to maturity that shows the value of each option at the end of 9 years. Use yield to maturity values ranging from 0% to 15% in 1% increments. ii. Based on your data table, construct a graph that demonstrates the performance of these 3 options. iii. Analyze each option’s performance in attempting to achieve immunization. 7. (10 Points) The Excel file Portfolio Bond Weight Calculation Data contains information about three bonds. Coupons are paid annually. Use matrix algebra and this data to construct a portfolio consisting of Bond 1, Bond 2, and Bond 3 that has a duration of 9 years, subject to the condition that the proportion invested in Bond 1 is 25% larger than the proportion invested in Bond 2. 8. (10 Points) A bond has just been issued. The bond has an annual coupon rate of 8% and coupons are paid semi-annually. The bond has a face value of $1,000 and will mature in 9 years. The bond’s annual yield to maturity is 9%. a. Use Excel’s Data Table feature to construct a Two-Way Data Table to demonstrate the impact of the coupon rate and the time to maturity on the bond’s duration using: i. Coupon Rates of 0%, 4%, 8%, and 12%. ii. Maturities of 4 years, 8 years, 12 years, 16 years, and 20 years. b. What four (4) duration principles or relationships are demonstrated in this table? 9. (10 Points) Using EXCEL’s Text Box Feature, explain the principle of immunization when used with a bond portfolio. a. What is bond portfolio immunization attempting to achieve? b. How is bond portfolio immunization achieved? c. Which two (2) bond risk components interact to make immunization successful? i. Explain how these bond risk components interact to immunize a bond portfolio as interest rates change. 10. (10 Points) Using EXCEL’s Text Box Feature, a. Explain how investors could apply Macaulay duration principles to make bond investment decisions. b. Explain the relationship between a bond’s coupon rate and yield to maturity, and the bond’s price.
COMP4130-E4 A Level 4 Module Autumn 2024/2025 Linear and Discrete Optimization Instructions: Time allowed: FOUR Weeks Complete ALL THREE Tasks For this coursework, you shall use the Excel and LP-Solve software to solve the Farm Management linear programming problem, including three tasks. After completing the coursework, you need to submit your solutions in a ZIP package through the Moodle system, named by “COMP4130-CW-Solutions-NAME-ID.zip” with “NAME” and “ID” replaced by your name and student ID respectively. The ZIP package should include the following files: 1 Three Excel files for the spreadsheet models of three tasks. Please name the files for the three tasks respectively as “COMP4130-CW-Task1-ID.xlsx”, “COMP4130- CW-Task2-ID.xlsx”, and “COMP4130-CW-Task3-ID.xlsx”, by replacing “ID” with your own 8-digit student ID. All the Excel files should annotate the parameter data, decision variables, intermediate variables and objective variables. 2 Three LP-Solve files for the algebraic models of three tasks. Please name the files for the three tasks respectively as “COMP4130-CW-Task1-ID.lp”, “COMP4130-CW- Task2-ID.lp”, and “COMP4130-CW-Task3-ID.lp”, by replacing “ID” with your own 8-digit student ID. All the lp files should have the comments on the constraints and optimization objectives. In addition, you need to submit a PDF report to explain your problem modelling and solving processes for the three tasks. Before the submission of your report, you need to complete the coursework submission coversheet on the Moodle page and attach the screenshot of the competion as the first page of your report. The report requires to use font 12pt and not exceed 10 pages. In your report, for each task, you should identify the data, decision variables, objective function and constraints, write the algebraic expressions of the objective function and constraints, and explain how to implement your formulated objective function and constrains in Excel Spreadsheet and LP-Solve, show the screenshots of the optimization results produced by Excel Spreadsheet and LP-Solve, check the consistency between the results of the two types of software. Please name the report as “COMP4130-CW-Report-NAME-ID.pdf”, by replacing “NAME” with your name and “ID” with your 8-digit student ID. Permitted resources: Those whose first language is not English may use a standard translation dictionary to translate between that language and English provided that neither language is the subject of this examination. The coursework is open-book. You are free to refer to the textbook, lecture notes and workshop sample solutions for solving the linear programming problem. Prohibited resources: ChatGPT and other AI language tools are not allowed to be used for report writing. Appended material: NONE Additional material: NONE Information for Invigilators: NONE 1. Farm Management Problem. Fred Jonasson manages a family-owned farm. To supplement several food products grown on the farm, Fred also raises pigs for mar- ket. He now wishes to determine the quantities of the available types of feeds (corn, tankage, and alfalfa) that should be given to pigs. Since pigs will eat any mix of these feed types, the objective is to determine which mix will meet certain nutri- tional requirements at a minimum cost. The number of units of each type of basic nutritional ingredient contained within a kilogram of each feed type is given in the following table, along with the nutritional requirements and feed costs: Nutritional Ingredient A Kilogram of Corn A Kilogram of Tankage A Kilogram of Alfalfa Minimum Requirement Carbohydrates 90 20 40 200 Protein 30 80 60 180 Vitamins 10 20 60 150 Cost $10.50 $9.00 $7.50 (a) Develop an Excel spreadsheet model (with file name “COMP4130-CW-Task1- ID.xlsx”) and an LP-Solve model (with file name “COMP4130-CW-Task1-ID.lp”) to solve the basic Farm Management Problem. Document a report for this task in file “COMP4130-CW-Report-NAME-ID.pdf”. (40 marks) (b) Based on the basic model built by Task 1, by making some necessary changes on the problem formulation, develop an Excel spreadsheet model (with file name “COMP4130-CW-Task2-ID.xlsx”) and an LP-Solve model (with file name “COMP4130-CW-Task2-ID.lp”) to solve the Farm Management Problem, with the additional condition: Fred Jonasson wants to maintain the relative balance on the consumption of the three feed types. He decides to restrict the quantity difference of any two consumed feed types not exceeding ( ≤) 0.5 kilogram. Document a report for this task in file “COMP4130-CW-Report-NAME-ID.pdf”. (15 marks) (c) Based on the basic model built by Task 1, by making some necessary changes on the problem formulation, develop an Excel spreadsheet model (with file name “COMP4130-CW-Task3-ID.xlsx”) and an LP-Solve model (with file name “COMP4130-CW-Task3-ID.lp”) to solve the Farm Management Problem, un- der the following scenario: Before determining the best mix of the three feed types, Fred Jonasson has already packed the feeds into bags, with each bag weighting 0.25 kilogram. To reduce the feed dispensing workload, Fred Jonasson decides to allocate the three types of feeds to pigs in bags. Please help Fred Jonasson deter- mine how many bags of the three types of feeds should be given to pigs with the minimum cost but satisfying the nutritional ingredient requirements. Document a report for this task in file “COMP4130-CW-Report-NAME-ID.pdf”. (15 marks)
Lab 4: File Recovery Introduction FAT has been around for nearly 50 years. Because of its simplicity, it is the most widely compatible file system. Although recent computers have adopted newer fi le systems, FAT32 (and its variant, exFAT) is still dominant in SD cards and USB fl ash drives due to its compatibility. Have you ever accidentally deleted a fi le? Do you know that it could be recovered?In this lab, you will build a FAT32 fi le recovery tool called Need You to Undelete my FILE, or nyufile for short. Objectives Through this lab, you will: Learn the internals of the FAT32 fi le system. Learn how to access and recover fi les from a raw disk. Get a better understanding of key fi le system concepts. Be a better C programmer. Learn how to write code that manipulates data at the byte level and understand the alignment issue. Overview In this lab, you will work on the data stored in the FAT32 fi le system directly, without the OS fi le system support. You will implement a tool that recovers a deleted file specifi ed by the user. For simplicity, you can assume thatthe deleted fi le is in the root directory. Therefore, you don’t need to search subdirectories. Working with a FAT32 disk image Before going through the details of this lab, let’s first create a FAT32 disk image. Follow these steps: Step 1: create an empty file of a certain size On Linux, /dev/zero is a special fi le that provides as many as are read from it. The dd command performs low-level copying of raw data. Therefore, you can use it to generate an arbitrary-size fi le full of zeros. For example, to create a 256KB empty fi le named fat32.disk: [root@... cs202]# dd if=/dev/zero f=fat32.disk bs=256k count=1 Read man dd for its usage. You will use this fi le as the disk image. Step 2: format the disk with FAT32 You can use the mkfs.fat command to create a FAT32 fi le system. The most basic usage is: [root@... cs202]# mkfs.fat -F 32 fat32.disk (You can ignore the warning of not enough clusters.) You can specify a variety of options. For example: [root@... cs202]# mkfs.fat -F 32 -f 2 -S 512 -s 1 -R 32 fat32.disk Here are the meanings of each option: -F: type of FAT (FAT12, FAT16, or FAT32). -f: number of FATs. -S: number of bytes per sector. -s: number of sectors per cluster. -R: number of reserved sectors. Step 3: verify the file system information The fsck.fat command can check and repair FAT fi le systems. You can invoke it with -v to see the FAT details. For example: [root@... cs202]# fsck.fat -v fat32.disk fsck.fat 4.1 (2017-01-24) Checking we can access the last sector of the filesystem Warning: Filesystem is FAT32 according to fat_length and fat32_length fields, but has only 472 clusters, less than the required minimum of 65525. This may lead to problems on some systems. Boot sector contents: System ID "mkfs.fat" Media byte 0xf8 (hard disk) 512 bytes per logical sector 512 bytes per cluster 32 reserved sectors First FAT starts at byte 16384 (sector 32) 2 FATs, 32 bit entries 2048 bytes per FAT (= 4 sectors) Root directory start at cluster 2 (arbitrary size) Data area starts at byte 20480 (sector 40) 472 data clusters (241664 bytes) 32 sectors/track, 64 heads 0 hidden sectors 512 sectors total Checking for unused clusters. Checking free cluster summary. fat32.disk: 0 files, 1/472 clusters You can see that there are 2 FATs, 512 bytes per sector, 512 bytes per cluster, and 32 reserved sectors. These numbers match our specifi ed options in Step 2. You can try different options yourself. Step 4: mount the file system You can use the mount command to mount a fi le system to a mount point. The mount point can be any empty directory. For example, you can create one at /mnt/disk: [root@... cs202]# mkdir /mnt/disk Then, you can mount fat32.disk at that mount point: [root@... cs202]# mount fat32.disk /mnt/disk Step 5: play with the file system After the fi le system is mounted, you can do whatever you like on it, such as creating fi les, editing files, or deleting fi les. In order to avoid the hassle of having long fi lenames in your directory entries, it is recommended that you use only 8.3 fi lenames, which means: The filename contains at most eight characters, followed optionally by a. and at most three more characters. The fi lename contains only uppercase letters, numbers, and the following special characters: ! # $ % & ' ( ) - @ ^ _ ` { } ~. For example, you can create a fi le named HELLO.TXT: [root@... cs202]# echo "Hello, world." > /mnt/disk/HELLO.TXT [root@... cs202]# mkdir /mnt/disk/DIR [root@... cs202]# touch /mnt/disk/EMPTY For the purpose of this lab, after you write anything to the disk, make sure to fl ush the fi le system cache using the sync command: [root@... cs202]# sync (Otherwise, if you create a fi le and immediately delete it, the fi le may not be written to the disk at all and is unrecoverable.) Step 6: unmount the file system When you fi nish playing with the fi le system, you can unmount it: [root@... cs202]# umount /mnt/disk Step 7: examine the file system You can examine the fi le system using the xxd command. You can specify a range using the -s (starting offset) and -l (length) options. For example, to examine the root directory: [root@... cs202]# xxd -s 20480 -l 96 fat32.disk 00005000: 4845 4c4c 4f20 2020 5458 5420 0000 0000 HELLO TXT .... 00005010: 6e53 6e53 0000 0000 6e53 0300 0e00 0000 nSnS....nS...... 00005020: 4449 5220 2020 2020 2020 2010 0000 0000 DIR ..... 00005030: 6e53 6e53 0000 0000 6e53 0400 0000 0000 nSnS....nS...... 00005040: 454d 5054 5920 2020 2020 2020 0000 0000 EMPTY .... 00005050: 6e53 6e53 0000 0000 6e53 0000 0000 0000 nSnS....nS...... (It’s normal that the bytes containing timestamps are different from the example above.) To examine the contents of HELLO.TXT: [root@... cs202]# xxd -s 20992 -l 14 fat32.disk 0005200: 4865 6c6c 6f2c 2077 6f72 6c64 2e0a Hello, world.. Note that the offsets may vary depending on how the fi le system is formatted. Your tasks Important: before running your nyufile program, please make sure that your FAT32 disk is unmounted. Milestone 1: validate usage There are several ways to invoke your nyufile program. Here is its usage: [root@... cs202]# ./nyufile Usage: ./nyufile disk -i Print the file system information. -l List the root directory. -r filename [-s sha1] Recover a contiguous file. -R filename -s sha1 Recover a possibly non-contiguous file. The fi rst argument is the fi lename of the disk image. After that, the options can be one of the following: -i -l -r filename -r filename -s sha1 -R filename -s sha1 You need to check if the command-line arguments are valid. If not, your program should print the above usage information verbatim and exit. Milestone 2: print the file system information If your nyufile program is invoked with option -i, it should print the following information about the FAT32 fi le system: Number of FATs; Number of bytes per sector; Number of sectors per cluster; Number of reserved sectors. Your output should be in the following format: [root@... cs202]# ./nyufile fat32.disk -i Number of FATs = 2 Number of bytes per sector = 512 Number of sectors per cluster = 1 Number of reserved sectors = 32 For all milestones, you can assume that nyufile is invoked while the disk is unmounted. Milestone 3: list the root directory If your nyufile program is invoked with option -l, it should list all valid entries in the root directory with the following information: Filename. Similar to /bin/ls -p, if the entry is a directory, you should append a / indicator. File size if the entry is a fi le (not a directory). Starting cluster if the entry is not an empty fi le. You should also print the total number of entries at the end. Your output should be in the following format: [root@... cs202]# ./nyufile fat32.disk -l HELLO.TXT (size = 14, starting cluster = 3) DIR/ (starting cluster = 4) EMPTY (size = 0) Total number of entries = 3 Here are a few assumptions: You should not list entries marked as deleted. You don’t need to print the details inside subdirectories. For all milestones, there will be no long fi lename (LFN) entries. (If you have accidentally created LFN entries when you test your program, don’t worry. You can just skip the LFN entries and print only the 8.3 fi lename entries.) Any fi le or directory, including the root directory, may span more than one cluster. There may be empty fi les. Milestone 4: recover a small file If your nyufile program is invoked with option -r filename, it should recover the deleted file with the specifi ed name. The workfl ow is better illustrated through an example: [root@... cs202]# mount fat32.disk /mnt/disk [root@... cs202]# ls -p /mnt/disk DIR/ EMPTY HELLO.TXT [root@... cs202]# cat /mnt/disk/HELLO.TXT Hello, world. [root@... cs202]# rm /mnt/disk/HELLO.TXT rm: remove regular file '/mnt/disk/HELLO.TXT'? y [root@... cs202]# ls -p /mnt/disk DIR/ EMPTY [root@... cs202]# umount /mnt/disk [root@... cs202]# ./nyufile fat32.disk -l DIR/ (starting cluster = 4) EMPTY (size = 0) Total number of entries = 2 [root@... cs202]# ./nyufile fat32.disk -r HELLO HELLO: file not found [root@... cs202]# ./nyufile fat32.disk -r HELLO.TXT HELLO.TXT: successfully recovered [root@... cs202]# ./nyufile fat32.disk -l HELLO.TXT (size = 14, starting cluster = 3) DIR/ (starting cluster = 4) EMPTY (size = 0) Total number of entries = 3 [root@... cs202]# mount fat32.disk /mnt/disk [root@... cs202]# ls -p /mnt/disk DIR/ EMPTY HELLO.TXT [root@... cs202]# cat /mnt/disk/HELLO.TXT Hello, world. For all milestones, you only need to recover regular files (including empty fi les, but not directory files) in the root directory. When the file is successfully recovered, your program should print filename: successfully recovered (replace filename with the actual fi le name). For all milestones, you can assume that no other files or directories are created or modifi ed since the deletion of the target file. However, multiple fi les may be deleted. Besides, for all milestones, you don’t need to update the FSINFO structure because most operating systems don’t care about it. Here are a few assumptions specifi cally for Milestone 4: The size of the deleted file is no more than the size of a cluster. At most one deleted directory entry matches the given fi lename. If no such entry exists, your program should print filename: file not found (replace filename with the actual fi le name). Milestone 5: recover a large contiguously-allocated file Now, you will recover a fi le that is larger than one cluster. Nevertheless, for Milestone 5, you can assume that such a fi le is allocated contiguously. You can continue to assume that at most one deleted directory entry matches the given filename. If no such entry exists, your program should print filename: file not found (replace filename with the actual file name). Milestone 6: detect ambiguous file recovery requests In Milestones 4 and 5, you assumed that at most one deleted directory entry matches the given filename. However, multiple files whose names differ only in the first character would end up having the same name when deleted. Therefore, you may encounter more than one deleted directory entry matching the given filename. When that happens, your program should print filename: multiple candidates found (replace filename with the actual file name) and abort. This scenario is illustrated in the following example: [root@... cs202]# mount fat32.disk /mnt/disk [root@... cs202]# echo "My last name is Tang." > /mnt/disk/TANG.TXT [root@... cs202]# echo "My first name is Yang." > /mnt/disk/YANG.TXT [root@... cs202]# sync [root@... cs202]# rm /mnt/disk/TANG.TXT /mnt/disk/YANG.TXT rm: remove regular file '/mnt/disk/TANG.TXT'? y rm: remove regular file '/mnt/disk/YANG.TXT'? y [root@... cs202]# umount /mnt/disk [root@... cs202]# ./nyufile fat32.disk -r TANG.TXT TANG.TXT: multiple candidates found Milestone 7: recover a contiguously-allocated file with SHA-1 hash To solve the aforementioned ambiguity, the user can provide a SHA-1 hash via command-line option -s sha1 to help identify which deleted directory entry should be the target file. In short, a SHA-1 hash is a 160-bit fi ngerprint of a fi le, often represented as 40 hexadecimal digits. For the purpose of this lab, you can assume that identical fi les always have the same SHA-1 hash, and different fi les always have vastly different SHA-1 hashes. Therefore, even if multiple candidates are found during recovery, at most one will match the given SHA-1 hash. This scenario is illustrated in the following example: [root@... cs202]# ./nyufile fat32.disk -r TANG.TXT -s c91761a2cc1562d36585614c8c680ecf5712e875 TANG.TXT: successfully recovered with SHA-1 [root@... cs202]# ./nyufile fat32.disk -l HELLO.TXT (size = 14, starting cluster = 3) DIR/ (starting cluster = 4) EMPTY (size = 0) TANG.TXT (size = 22, starting cluster = 5) Total number of entries = 4 When the fi le is successfully recovered with SHA-1, your program should print filename: successfully recovered with SHA-1 (replace filename with the actual fi le name). Note that you can use the sha1sum command to compute the SHA-1 hash of a file: [root@... cs202]# sha1sum /mnt/disk/TANG.TXT c91761a2cc1562d36585614c8c680ecf5712e875 /mnt/disk/TANG.TXT Also note that it is possible that the fi le is empty or occupies only one cluster. The SHA-1 hash for an empty fi le is da39a3ee5e6b4b0d3255bfef95601890afd80709. If no such fi le matches the given SHA-1 hash, your program should print filename: file not found (replace filename with the actual fi le name). For example: [root@... cs202]# ./nyufile fat32.disk -r TANG.TXT -s 0123456789abcdef0123456789abcdef01234567 TANG.TXT: file not found The OpenSSL library provides a function SHA1(), which computes the SHA-1 hash of d[0...n-1] and stores the result in md[0...SHA_DIGEST_LENGTH-1]: #include #define SHA_DIGEST_LENGTH 20 unsigned char *SHA1(const unsigned char *d, size_t n, unsigned char *md); You need to add the linker option -lcrypto to link with the OpenSSL library. Milestone 8: recover a non-contiguously allocated file Finally, the clusters of a fi le are no longer assumed to be contiguous. You have to try every permutation of unallocated clusters on the fi le system in order to fi nd the one that matches the SHA-1 hash. The command-line option is -R filename -s sha1. The SHA-1 hash must be given. Note that it is possible that the fi le is empty or occupies only one cluster. If so, -R behaves the same as -r, as described in Milestone 7. For Milestone 8, you can assume that the entire fi le is within the fi rst 20 clusters, and the fi le content occupies no more than 5 clusters, so a brute-force search is feasible. If you cannot fi nd a fi le that matches the given SHA-1 hash, your program should print filename: file not found (replace filename with the actual fi le name). FAT32 data structures For your convenience, here are some data structures that you can copy and paste. Please refer to the lecture slides and FAT: General Overview of On-Disk Format for details on the FAT32 fi le system layout.
Coursework Assessment Pro-forma Key Information Module Code: CM2104 Module Title: Computational Mathematics Assessment Title: Plotting Circles Assessment Number: 1 Assessment Weighting: 50% of a 10-credit level 2 module Assessment Limits: A MATLAB program with GUI plus a report (1–2 pages of text; plus diagrams, screenshots etc.). The Assessment Calendar can be found under ‘Assessment & Feedback’ in the COMSC-ORG- SCHOOL organisation on Learning Central. This is the single point of truth for (a) the hand out date and time, (b) the hand in date and time, and (c) the feedback return date for all assessments. Learning Outcomes The learning outcomes for this assessment are as follows: • Show a clear understanding of basic MATLAB programming environment and data struc- tures • Understand the practical implementation of some general mathematical techniques via MATLAB • Demonstrate an awareness of basic Linear Algebra and its application to Computational Geometry concepts with MATLAB Submission Instructions The coversheet can be found under ‘Assessment & Feedback’ in the COMSC-ORG-SCHOOL organisation on Learning Central. Files should be submitted via Learning Central. The submission page can be found under ‘Assessment & Feedback’ in the CM2104 module on Learning Central. Your submission should consist of multiple files: Description Type Name Cover sheet Compulsory One PDF (.pdf) file [Student number] .pdf Report Compulsory One PDF (.pdf) file Report [Student number] .pdf Source code Compulsory Matlab files for the GUI (.mlapp) and modified func- tion stubs (.m) packaged as a single zip file Code [Student number] .zip Any deviation from the submission instructions above (including the number and types of files submitted) may result in a reduction in marks for the assessment. Any code submitted will be run on a system equivalent to those available in the Linux laboratory, should be able to run on the School’s MATLAB installation (without requiring the use of any MATLAB toolboxes), and must be submitted as stipulated in the instructions above. If you are unable to submit your work due to technical difficulties, please submit your work via e-mail to [email protected] and notify the module leader. Assessment Description Develop a graphical user interface (GUI) implemented in MATLAB using App Designer that allows the user to specify and draw several circles according to some geometric construction rules. To enable me to carryout unit testing outside of your GUI, for some of the tasks below a function stub is provided (download stubs .zip which contains them) which you should modify to include the appropriate functionality. For your convenience, when you integrate these functions into your GUI you can modify them. Another requirement is that you should use homogeneous coordinates to represent all the data. Please note: You are allowed to use any of John Burkardt’s Geometric Processing toolbox functions (a link was provided in the slides) and third-party libraries, as long as the third-party add-ons run on their own without any initial set-up. You must clearly reference the sources in your report. Task 1 (5% weight): Write a MATLAB GUI that allows the user to enter points, which are then displayed in the window. Note: all points input by the user should be entered by clicking locations in the window using the mouse (rather than entering coordinates in a text box). The GUI should also provide buttons to allow the user to hide or reveal the various geometric elements that will be drawn during the coursework (i.e. hide/reveal all points, hide/reveal all lines, hide/reveal all intermediate constructions). Extend the above GUI so that when the “run construction” button is pressed by the user the following steps for Tasks 1–6 are carried out in sequence (i.e. draw the geometric elements as soon as they are fully specified): Task 2 (5% weight): Construct and draw the first circle The user enters a point which is the centre of the circle. The user enters another point, which will lie on the circumference of the circle. Draw the points and the circle. Task 3 (5% weight): Construct and draw the second circle The user enters a point, which is the centre of the circle. The second circle should have tangential contact with the first circle. Draw the point and the circle. Modify this function stub to compute the radius of the second circle: radiusTangentialCircle Task 4 (30% weight): Construct and draw the third circle Enter 2 points and for each find the closest point on the closest circle (i.e. “snap” to the circle). These two closest points will lie on the circumference of the third circle. In more detail: • The user enters the first point; determine which circle it is closest to, and find the closest point on its circumference. • The user enters another point; repeat the process to find the closest point. Report an error if both user points are closest to the same circle. • Modify this function stub to find which circle is closest and also the closest point: closestPointsOnCircle Select another point on the third circle • Construct and draw the perpendicular bisector of the line segment between the centres of circles 1 and 2. • Modify this function stub to determine the parameters of the perpendicular bisector: centreBisector • The user enters apoint. Determine the closest point on the perpendicular bisector to this point (i.e. “snap” to the line); that closest point will be another point on the circumference of the third circle. Modify this function stub: closestPointBisector Determine the parameters of the third circle • Construct the two perpendicular bisectors through the line segments between the snapped points 1 & 2 and 1 & 3. • The circle centre is at the intersection of the perpendicular bisectors. • Modify this function stub to determine the circle centre and radius: circleSolve Plot the initial user selected points, the calculated closest points, the perpendicular bisectors, and the third circle. Task 5 (5% weight): Circle intersections Find the intersections of the third circle with the first two circles, and plot them. Modify this function stub to determine the circle intersections: intersections2circles Task 6 (10% weight): Geometric transformation Consider the translation that will place the centre of the third circle at the origin. Consider the rotation about the origin that will align the perpendicular bisector of the line segment between the centres of circles 1 and 2 with the X-axis, with the constraint that the rotation angle should be in the range [0, 180。]. Calculate the translation and rotation matrices and an overall combined transformation matrix (all using homogeneous coordinates). Apply the transformation to all the geometric elements on the window (i.e. translate and rotate the circles, points, etc. in the window) Modify this function stub to determine the matrices: transformationMatrices Task 7 (10% weight): Save and replay option The GUI should also allow the user to save all the points inputted by the users in a file. Use the following data format for the saved coordinates: a text file containing the ordered, space separated x and y coordinates, one pair per line. They should be ordered in the file in the same order as they are entered by the user. The GUI should allow the user to load a set of points from such a file and then directly construct the circles and the additional geometric elements and display them. Task 8 (20% weight): In order to gain higher marks you need to add some novel extensions or additional features. You need only provide two further different novel extensions (such as those suggested here). There are endless possibilities here and you are encouraged to think of your own extensions. Here are a few suggestions: • Allow the user to select and modify elements in the window, after which the remaining elements are automatically updated. • The third circle is replaced by a sphere, for which the user has some geometric means of specifying its parameters. The intersections of the sphere with the first two circles are found and plotted. • The GUI includes some form of animation of the geometric elements in the window. Task 9 (10% weight): You must supply a report on your submission which provides a short written description (1–2 pages of text; plus diagrams, screenshots etc.) conveying all the ap- propriate information to demonstrate its operation and explaining your extension of the basic algorithm. Include your student number in the report. Assessment Criteria Credit will be awarded according to the correct functioning of the following components of the code. 1. 5% – Basic MATLAB GUI (Task 1) 2. 5% – Construct and draw the first circle (Task 2) 3. 5% – Construct and draw the second circle (Task 3) 4. 30% – Construct and draw the third circle (Task 4) 5. 5% – Circle intersections (Task 5) 6. 10% – Geometric transformation (Task 6) 7. 10% – Save and replay option (Task 7) 8. 20% – Design and implementation of novel extensions or additional features (Task 8) 9. 10% – Report describing the operation of your program and your extension of the basic algorithm (Task 9)
MATH70093– Computational Statistics Assessed coursework —Autumn 2024 Preparing your Coursework 1. Hand-out date: December 2nd 2024 - 9am. 2. Hand-in date: December 13th 2024 - noon. 3. Please use the Rmarkdown template file on the Statistics MSc 2024-2025 Blackboard page to write your report. Code should be provided in the appendix (which will be done automatically by the template provided). Ensure your submitted file has tidy and well documented code chunks. 4. The report should be properly structured, and should be written using complete sentences. Marks are given both for the content of the report (correctness of code, numerical answers, etc.) and the quality of the presentation (clarity of plots, explanations, etc.). 5. At the beginning of your report you must include the following statement: “I, [insert CID], certify that this assessed coursework is my own work, unless otherwise acknowledged, and includes no plagiarism. I have not discussed my coursework with anyone else except when seeking clarification with the module lecturer via email or on MS Teams. I have not shared any code underlying my coursework with anyone else prior to submission. ” 6. All coding should be done in ‘Rmarkdown’ . Please ensure your submitted file has tidy and well documented code chunks in appendix. Submitting your coursework 1. Submit via Blackboard before the deadline a pdf version of your report. 2. The name of your submitted file should begin with CW followed by your CID, e.g. if I were to submit a coursework, I would submit: CW 00830053.pdf. 1. (50% of total marks) Consider the undirected graph in Figure 1 modelling part of the tube network, where the random variables X1, . . . , X5 are independent uniformly distribution with Xi ~ U [0, ai] for a1 = 1, a2 = 2, a3 = 3, a4 = 1, a5 = 2, which represent the (random) travel time along each line. Figure 1: Tube Network from A (Earl’s Court) to B (South Kensington) (a) The shortest path from A to B and its associated travel time T will be random, depending on the specific values of the random variables X1, . . . , X5 . Let I = E[T] be the expected minimum travel time from A to B. Show that this can be written as I = E[H(U)], with H(U) = min{a1 U1 + a4 U4 , a1 U1 + a3 U3 + a5 U5 , a2 U2 + a3 U3 + a4 U4 , a2 U2 + a5 U5 }, where U = (U1, . . . , U5 ) with U1, . . . , U5 ~ U [0, 1] are independent random variables. (b) Using a standard Monte Carlo estimator, approximate the expected length I. Implement this estimator in R. For diferent values of n, compute the estimated mean I , and the estimated 95% confidence intervals, and plot them. Compare with the exact value I = . (c) We shall attempt to reduce the variance of this estimator using importance sampling, by choosing as importance sampling distributions U1 , U4 ~ Beta(v, 1), and U2 , U3 , U5 ~ U [0, 1] independent, and then introducing appropriate importance sampling weights. Develop such a scheme and implement it in R. Using numerical experiments find a value of v in the range (1, 1.5) which provides a significant reduction in variance. Plot the estimated mean I, along with the estimated 95% confidence intervals as a function of the number of samples n. (d) Let G(U) = min{a1 U1 + a4 U4 , a2 U2 + a5 U5 }. We know that a control variate - H(U) = H(U) + QC(U), for some C and write down expressions which approximate the optimal value Q = Q* to give minimal variance. Implement this estimator in R. Plot the estimated mean I, along with the estimated 95% confidence intervals as a function of the number of samples n. (e) Compare the performance of the three approaches in terms of variance and discuss the advantages and disadvantages of each respective approach. 2. (50% of total marks) Download your individual dataset D = {xn }n(5)1 corresponding to your CID and available on Blackboard. Please state the last two digits of your CID at the beginning of this question. Consider a mixture model of Poisson distributions, i.e., a model of the form. P (X = x|{λk , {wk = wk ψ(x; λk ) for any x ∈ N where ψ(·; λ) is the probability mass function of a Poisson distribution with parameter λ; and Σ wk = 1. Assume the following prior distribution on the model parameters: for a fixed K, (w1, . . . wK ) follows a Dirichlet distribution with concentration parameter (1/K, . . . , 1/K) and λ 1, . . . , λK are i.i.d. following a uniform distribution between 0 and 20. The aim of this question is to perform Bayesian inference of the model parameters (λ1, . . . λK , w1, . . . wK ) given the dataset D and to identify whether a model with two (K = 2) or three (K = 3) components best explain the dataset. (a) Consider a mixture model of Poisson distributions with K ∈ {2, 3} components. Devise and implement a Metropolis-within-Gibbs algorithm that takes K as an input and produces samples from the posterior distribution p(λ1 , λ2 , w1 , w2 |D, K = 2) if K = 2 or p(λ1 , λ2 , λ3 , w1 , w2 , w3 |D, K = 3) if K = 3. Precisely describe the algorithm with equations so that one could implement it without looking at your code. For each value of K ∈ {2, 3}, run the algorithm multiple times with diferent starting values and verify graphically that the MCMC algorithm has reached convergence. (b) Using the outputs of the Metropolis-within-Gibbs algorithm implemented in question (a), estimate the following posterior predictive distributions for K = 2 and K = 3 P(X = x* |D, K) = ∫ · · · ∫ P (X = x* |{λk , {wk p ({λk , {wk dλ 1 . . . dλK dw1 . . . dwK for all x* ∈ {0, 1, 2, . . . 30}. Write down the equation of your estimates. Produce a plot featuring P(X = x* |D, K = 2) and P(X = x* |D, K = 3) for x* ∈ {0, 1, 2, . . . 30} along with the distribution of the data in D. Comment on the relative suitability of each model to explain your dataset D. (c) Devise and implement a Reversible Jump Process to jointly infer the number of components, K, and the model parameters given your dataset. You only need to consider a model with two components and one with three components, i.e., K ∈ {2, 3}. Precisely describe the proposed algorithm. Produce graphs to assess the convergence of the MCMC algorithm. From the output of the algorithm, report the posterior probability of the model with two component and the one with three components, i.e. P(K = 2|D) and P(K = 3|D). How does this compare to your answer in question (b)? Tips: Many between-model moves can be considered but some might lead to a better mixing than others. In the article attached on Blackboard, Richardson and Green proposed between-models move in the context of a mixture of Gaussian distributions. Here is a suggestion of between-model moves inspired by this article and adapted to the mixture of Poisson distributions: • The combine move starts with a model with 3 components and proposes to move to a model with 2 components. This is done by choosing a pair of components (j1 , j2 ) at random, that are adjacent in terms of the current values of their means (i.e. λj1 < λj2 and the third component is not in [λj1 , λj2 ]). These two components are merged into one component labelled j * such that: wj * = wj1 + wj2 and λj * = (wj1 λj1 + wj2 λj2 )/wj * . (1) • The reverse split proposal (from a model with 3 components to a model with 2 components) consists in randomly selecting one component j * (among the three components) and split it into two, labelled j1 and j2 , with weights and means that satisfy (1). This can be done for example by sampling u1 ~ Beta(2, 2) and u2 ~ N(0, 1) and setting wj1 = u1 wj * , wj2 = (1—u1 )wj * , λj1 = λj * —u2 √wj2 /wj1 and λj2 = λj * + u2 √wj1 /wj2 . If the adjacency condition is not satisfied, the move should be rejected. .
Mathematics for Social Science 2024-25 Instructions · You must include a cover sheet at the start of the assessment. Do not write your name or student number anywhere on the assessment. · All questions are compulsory, so skipping a question will mean you will receive a point of zero for that question. · The paper is marked out of a total of 100% percentage points. This exam is worth 85% of your final grade. · Submit your exam on LEARN by noon (12:00) on December 4th 2024. Exams returned from 12:00 and 1 second onwards will have points deducted for late submission. Regular assessment penalties will apply for late submissions: 5 percentage points per calendar day, up to a maximum of 35 percentage points for 7 calendar days, and zero points will be given for an assessment handed in more than 7 days late. · When in doubt, check your lecture slides and notes. The assessment requires only syntax and skills covered in class and in your class related material (e.g. statistics exercises, syntax files, lecture slides). 1. Principle Component Analysis [30%] Please run a Principle Component Analysis with the following variables from the 2015 British Election Survey (SPSS dataset “BES_2015_spssdata.sav”). You need to download this specific dataset from LEARN, and not use any other BES dataset you may have used in the course. The module of questions you are looking at are all about asking respondents whether they think more or less public money should be spent on different things. Answers are on a 5-point Likert scale and range from much more than now (coded as 1) to much less than now (coded as 5). For more information see the data codebook uploaded on LEARN. Responses coded as “Don’t know”, and “Not stated” have been set as missing data, and are not taken into account in any PCA. Use SPSS for this exercise. Carry out a Principle Component Analysis (PCA) of these variables using Varimax Rotation. Please explain the PCA results using up to 3 of the results outputs/tables produced by SPSS. Explore whether these survey questions seem suitable to be analysed using PCA and explain what diagnostic statistics you use to determine this showing that you understand how these diagnostic tests work. Please provide an interpretation of the Rotated results and what concepts the factors are capturing, and how components should be interpreted. Your total answer for the above should be up to 600 words long. pubex_a Thinking about public expenditure on HEALTH, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_b Thinking about public expenditure on EDUCATION, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_c Thinking about public expenditure on UNEMPLOYMENT BENEFITS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_d Thinking about public expenditure on DEFENCE, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_e Thinking about public expenditure on OLD-AGE PENSIONS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_f Thinking about public expenditure on BUSINESS AND INDUSTRY, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_g Thinking about public expenditure on POLICE AND LAW ENFORCEMENT, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? pubex_h Thinking about public expenditure on WELFARE BENEFITS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now? 2. Binary Logistic Regression [40%] 2a. Please observe the output below, from a logit model which has been run on WDI data, and fill in the missing components (denoted by a space like this: “_____”) of the interpretations in the sentences below. For every 1 percentage point increase in adult literacy within the population, the odds of a country having a functioning democracy (with voting and party alternation) rise by ____ to 1. For every 10 percentage point increase in adult literacy within the population, the odds of a country having a functioning democracy (with voting and party alternation) rise by ______% (fractional odds). 2b. For this question you will need to use Stata and the Stata version of the 2017 British Election Survey, “BES_2017_data.dta”. Please make sure you download and use the correct dataset, and not other versions of the BES survey. (Max 400 words for section 2 – be concise!). You want to investigate what the relationship is between people’s age, the political party they voted for in the 2017 election, and their self-reported level of trust in politicians. You will need the following variables to complete this task: y10_banded; trustpol; votefor. Firstly, run some bivariate analysis and summarise succinctly what the relationship is between age and trust in politicians. 2c. Now you will run two logit models, one after the other. In the first model predicting trust in politicians, your only independent will be respondents’ age. In the second model you will add a second independent variable: votefor – i.e. the party that participants voted for in the most recent election. For the second logit model, for the “votefor” variable, please make response category 4 of this variable the reference category. For both models, for the age variable, make the youngest age band the reference category. The models should be run so Odds Ratios (not logits) are being reported. Please paste the output (and syntax) for both models below (using a screen grab/snapshot, and not copy pasting text, so that the results can be easily read). 2d. Discuss results of second model overall in simple language using odds ratios in some way in your interpretation. 2e. Look at how the Odds Ratio for the age 85+ category change from model 1 to model 2. Why do you think this may be happening, and what does this mean substantively? 2f. Explain, using your understanding of the Integration of the Normal Curve, what you make of the Odds Ratios for those who voted Labour, given the p-value for this specific result. 3. Interaction Effects [20%] This graph has been extracted from a research study using the Millennium Cohort Study following a nationally representative cohort of children born at the turn of the century in the UK, and is exploring interaction effects between different characteristics of the mothers who responded in the survey, and looks at different predicted probabilities of a cohort member (child) having experienced smacking at age seven for different groups of mothers. These results are from a model (not shown) with a sample size of N=7745 which is controlling for other socio-economic characteristics of the mother. The two variables used to test for interactions in the graph below are ethnicity and child abuse at age three. · Ethnicity is a binary variable with either White or Other Ethnic Group. · The child abuse variable is categorical and has three categories which refers to children who had experienced child abuse at age three: with (non) referring to no abuse, (moderate) referring to experiencing moderate abuse and (high) referring to experiencing high levels of abuse. 3a. Please interpret the results you see in simple language, using interpretive skills you learnt during the course. 3b. What do you think is a likely explanation for the fact that the size of the confidence interval for the other ethnic group and high child abuse exposure is so large, compared to the confidence interval size of the white and non child abuse exposure category? 4. Probability, Polls and Confidence Intervals [10%] The https://whatukthinks.org/eu/ website aggregates polling data on a number of polls that are run on UK populations. On the 17-18th July 2020, YouGov ran an opinion poll asking respondents “The EU’s Erasmus programme helps Britons to get work experience and study in another European country, and Europeans to get work experience and study in Britain. Do you think the UK should stay a member or stop being a member of Erasmus at the end of 2020?”. The characteristics of this poll were: Sample size = 1658 ; Percentage who answered they want to stay a member of Erasmus = 80% (if you exclude those who responded “don’t know”, and 24% of the sample responded this way). Using the above information, and using the appropriate formula the way it should be used (and not the way polling companies often use it), calculate and answer the following question (showing your workings): - Q1. Provide the 95% confidence interval for the estimate of the proportion responding that they wanted to stay a member of Erasmus. - Q2. How many standard errors plus/minus the mean of 80% would you need to be 100% certain of a result falling within a given confidence interval?
Syllabus - POSC 104 - American People and Politics - Fall 2024 Political Science 104.02 - American People and Politics Fall 2024 Brief Course Overview The aim of this course is for students to gain a deeper understanding of how democracy really works in the American political system. We will begin with an overview of American politics and government, with particular emphasis on the constitution. Next we will shift to the major institutions within American government. After that we will study politics and elections with particular focus on recent political trends in the U.S., which we will examine from the perspective of elected representatives as well as the public. Finally, we will touch on political psychology and assess the role of news and popular media within American democracy. All along the way, we will pay close attention to key events in American history that shaped our current politics and institutions, such as the history of slavery and Jim Crow, debates around immigration, the broader struggle for equal rights, and the role of state and local governments (with particular focus on California). Because this is an introductory course and it covers a wide range of topics, it will be difcult to go into a great deal of depth on any given subject, but if all goes according to plan, you should leave this course with a very solid foundation for further study within American politics and with the tools to be an engaged and informed participant in the American political system. Course Learning Objectives This course fulfills part of the GE American Experience requirement and aligns with the following General Education Learning Outcomes: Students identify and explain significant political and historical developments that have shaped America’s democracy and its diverse society in the context of the discipline of political science. Students apply their knowledge by developing a personal vision regarding diversity. Additional Learning Objectives Develop a better understanding of how American government operates and how the various branches interact Become more familiar with key historical events that shaped our current politics and institutions Improve critical thinking skills generally, develop the skill of thinking quantitatively and assessing evidence, and learn to apply these to contemporary political questions Leave the course with a deeper understanding of the key forces driving American politics and with the desire to become a more active and engaged participant in American democracy Participation, Attendance and Classroom Expectations My preference is to design courses to be highly interactive and participatory for students, with a dialogue between students and instructor as well as student-to-student (as opposed to a top-down classroom model where the instructor spends most of the time speaking and lecturing to the class). While lectures and slide shows will be a key part of the course, I absolutely want you to have a chance to discuss the topics we cover and to ask questions, so please come to class prepared to contribute to discussions during class. Please refrain from using cell phones, laptops and other portable electronic devices during class. A growing body of research shows that taking notes by hand is more efective and promotes recall better than typing on a laptop. And as we probably all have experienced, internet connected devices can be pretty distracting, for both you and people around you. Attendance will not be taken in every class session, but it will be taken at least 10 times throughout the semester and your attendance/participation grade will make up 10% of your overall grade. Beyond the grade for attendance/participation, you will certainly get more from the course, and you will earn a higher grade on exams, if you attend class in-person as much as you can. You can have two absences without it negatively afecting your grade, but more than two absences will result in a lower attendance/participation grade. If something comes up where you know you will need to miss class for an extended period, please let me know in advance of missing class to be excused. Exams in the class will be open-notes, but only hand-written notes will be allowed; this will hopefully give you a strong incentive to attend class regularly, engage with the lecture material, and take notes. Lastly, I am committed to facilitating an open and inclusive classroom environment, so please be respectful to anyone who is speaking in the class. Needed for the Course You will need a reliable internet connection and access to a functioning laptop or desktop computer for this course. The only book you will need to buy (or borrow/rent) is Why We’re Polarized by Ezra Klein. You can currently buy the paperback version for about $12 on Amazon and other online retailers. We will also read some of the chapters from the open access textbook American Government 3e, which is available for free through OpenStax. I will post or link all other readings to the course website. Graded Items I will introduce and review all assignments/exams individually at least one week before they are due. Individual graded items, their value toward your overall grade and their placement in the course schedule are listed below. - Attendance/Participation: 10% All Semester - Homework assignments: 10% Weeks 3, 8 and 13 - First Exam: 20% Sept 23rd - Second Exam: 30% Oct 28th - Final Exam: 30% Dec 4th Homework assignments will generally require you to write a response to an argument or put forward your own proposed solution to a contemporary challenge in American politics. They will be brief in nature and shouldn’t take more than a few hours to complete. Due to how large this class is, exams will primarily feature multiple choice and short fill-in-the-blank questions. Technically, the exams are cumulative (i.e. they could cover anything from the whole course to that point) but the vast majority of the content covered on an exam will come from material we cover after the previous exam. For example, the second exam could cover material from the first five weeks of the class, but the vast majority will come from material covered in weeks 6 - 10 (i.e. after the first exam). As mentioned above, exams will be open notes, but only your own hand-written notes are permitted. This means that coming to class, paying attention, and taking good notes will be critical for succeeding in the class.
CS202: Lab 5: File system Introduction In this lab, you will implement (pieces of) a simple disk-based file system. There is not a lot of code to write; instead, a lot of the work of the lab is understanding the system that you’ve been given. By the end of the lab, you will be able to run your lab 2 ls against this lab’s file system. Getting Started You’ll be working in the Docker container as usual. We assume that you have set up the upstream as described in the lab setup. Then run the following on your local machine from outside of Docker: $ cd ~/cs202 $ git fetch upstream $ git merge upstream/main This lab’s files are located in the lab5 subdirectory. If you have any “conflicts” from lab 4, resolve them before continuing. Run git push to save your work back to your personal repository. The rest of these instructions presume that you are in the Docker environment. We omit the cs202-user@172b6e333e91:~/cs202-labs part of the prompt. FUSE The file system that we will build is implemented as a user-level process. This file system's storage will be a file (in the example given below, we call it testfs.img) that lives in the normal file system of your Docker container. Much of your code will treat this file as if it were a disk. This entire arrangement (file system implemented in user space with arbitrary choice of storage) is due to software called FUSE (Filesystem in Userspace). In order to really understand what FUSE is doing, we need to take a brief detour to describe VFS. Linux (like Unix) has a layer of kernel software called VFS; conceptually, every file system sits below this layer, and exports a uniform. interface to VFS. (You can think of any potential file system as being a "driver" for VFS; VFS asks the software below it to do things like "read", "write", etc.; that software fulfills these requests by interacting with a disk driver, and interpreting the contents of disk blocks.) The purpose of this architecture is to make it relatively easy to plug a new file system into the kernel: the file system writer simply implements the interface that VFS is expecting from it, and the rest of the OS uses the interface that VFS exports to it. In this way, we obtain the usual benefits of pluggability and modularity. FUSE is just another "VFS driver", but it comes with a twist. Instead of FUSE implementing a disk-based file system (the usual picture), it responds to VFS's requests by asking a user-level process (which is called a "FUSE driver") to respond to it. So the FUSE kernel module is an adapter that speaks "fuse" to a user-level process (and you will be writing your code in this user-level process) and "VFS" to the rest of the kernel. Meanwhile, a FUSE driver can use whatever implementation it wants. It could store its data in memory, across the network, on Jupiter, whatever. In the setup in this lab, the FUSE driver will interact with a traditional Linux file (as noted above), and pretend that this file is a sector-addressable disk. The FUSE driver registers a set of callbacks with the FUSE system (via libfuse and ultimately the FUSE kernel module); these callbacks are things like read, write, etc. A FUSE driver is associated with a particular directory, or mount point. The concept of mounting was explained in OSTEP 39 (see 39.17). Any I/O operations requested on files and directories under this mount point are dispatched by the kernel (via VFS, the FUSE kernel module, and libfuse) to the callbacks registered by the FUSE driver. To recap all of the above: the file system user interacts with the file system roughly in this fashion: 1. When the file system user, Process A, makes a request to the system, such as listing all files in a directory via ls, the ls process issues one or more system calls (stat(), read(), etc.). 2. The kernel hands the system call to VFS. 3. VFS finds that the system call is referencing a file or directory that is managed by FUSE. 4. VFS then dispatches the request to FUSE, which dispatches it to the corresponding FUSE driver (which is where you will write your code). 5. The FUSE driver handles the request by interacting with the "disk", which is implemented as an ordinary file. The FUSE driver then responds, and the responses go back through the chain. Here's an example from the staff solution to show what this looks like, where testfs.img is a disk image with only the root directory and the file hello on its file system: # Create a directory to serve as a mount point. # Note: the / is important, because the directory # should live only in docker's filesystem $ mkdir /lab5mnt # create simlink to local directory mnt $ ln -s /lab5mnt mnt # see what file system mnt is associated with $ df mnt Filesystem 1K-blocks Used Available Use% Mounted on overlay 61202244 8831452 49229468 16% / # notice, 'mnt' is empty $ ls mnt # mount testfs.img at mnt: $ build/fsdriver testfs.img mnt # below, note that mnt's file system is now different $ df mnt Filesystem 1K-blocks Used Available Use% Mounted on CS202fs#testfs.img 8192 24 8168 1% /lab5mnt # and there's the hello file... $ ls mnt hello # ...which we can read with any program $ cat mnt/hello Hello, world! # now unmount mnt $ fusermount -u mnt # and its associated file system is back to normal $ df mnt Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 7092728 4536616 2172780 68% / # and hello is gone, but still lives in testfs.img $ ls mnt Note that in the above example, after we run fsdriver, the kernel is actually dispatching the all the open(), read(), readdir(), etc. calls that ls and cat make to our FUSE driver. The FUSE driver takes care of searching for a file when open() is called, reading file data when read() is called, and so on. When fusermount is run, our file system is unmounted from mnt, and then all I/O operations under mnt return to being serviced normally by the kernel. Our File System Below, we give an overview of the features that our file system will support; along the way, we review some of the file system concepts that we have studied in class and the reading. On-Disk File System Structure Most UNIX file systems divide available disk space into two main types of regions: inode regions and data regions. UNIX file systems assign one inode to each file in the file system; a file's inode holds a file's meta-data (pointers to data blocks, etc.). The data regions are divided into much larger (typically 4KB or more) data blocks, within which the file system stores file data and directory data. Directory entries (the "data" in a directory) contain file names and inode numbers; a file is said to be hard-linked if multiple directory entries in the file system refer to that file's inode. Both files and directories logically consist of a series of data blocks; these blocks can be scattered throughout the disk much as the pages of a process's virtual address space can be scattered throughout physical memory. Unlike most UNIX file systems, we make a simplification in the layout of the file system: there is only one region on the disk, in which both inode blocks and data blocks reside. Furthermore, each inode is allocated its own disk block instead of being packed alongside other inodes in a single disk block. Sectors and Blocks Disk perform. reads and writes in units of sectors, which are typically 512 bytes. However, file systems allocate and use disk storage in units of blocks (for example, 4KB, or 8 sectors). Notice the distinction between the two terms: sector size is a property of the disk hardware, whereas block size is a creation of the file system that uses the disk. A file system's block size must be a multiple of the sector size of the underlying disk. As explained in class, there are advantages to making the block size larger than the sector size. Our file system will use a block size of 4096 bytes. Superblocks File systems typically place important meta-data at reserved, well-known disk blocks (such as the very start of the disk). This meta-data describes properties of the entire file system (block size, disk size, meta-data required to find the root directory, the time the file system was last mounted, the time the file system was last checked for errors, and so on). These special blocks are called superblocks. Many "real" file systems maintain multiple replicas of superblocks, placing them far apart on the disk; that way, if one of them is corrupted or the disk develops a media error in that region, the other replicas remain accessible. Our file system will have a single superblock, which will always be at block 0 on the disk. Its layout is defined by struct superblock in fs_types.h. Block 0 is typically reserved to hold boot loaders and partition tables, so file systems generally do not use the very first disk block. Since our file system is not meant to be used on a real disk, we use block 0 to store the superblock for simplicity. The superblock in our file system contains a reference to a block containing the "root" inode (the s_root field in struct superblock). The "root" inode is the inode for the file system's root directory. This inode stores pointers to blocks; these blocks, together, contain a sequence of dirent structures. Each structure includes a file and an inode number (one can think of this as assigning a given "name" to a given inode); the collection of these structures forms the content of the file system's root directory. These contents can include further directories, etc. The Block Bitmap: Managing Free Disk Blocks In the same way that the kernel must manage the system's physical memory to ensure that a given physical page is used for only one purpose at a time, a file system must manage the blocks of storage on a disk to ensure that a given disk block is used for only one purpose at a time. In WeensyOS, you kept the physical_pageinfo structures for all physical pages in an array, pageinfo, to keep track of the free physical pages in kernel.c. In file systems it is common to keep track of free disk blocks using a bitmap (essentially, an array of bits, one for each resource that is being tracked). A given bit in the bitmap is set if the corresponding block is free, and clear if the corresponding block is in use. The bitmap in our file system always starts at disk block 1, immediately after the superblock. For simplicity we will reserve enough bitmap blocks to hold one bit for each block in the entire disk, including the blocks containing the superblock and the bitmap itself. We will simply make sure that the bitmap bits corresponding to these special, "reserved" areas of the disk are always clear (marked in-use). Note that, since our file system uses 4096-byte blocks, each bitmap block contains 4096*8=32768 bits, or enough bits to track 32768 disk blocks. File Metadata The layout of the meta-data describing a file in our file system is described by struct inode in fs_types.h. This meta-data includes the file's size, type (regular file, directory, symbolic link, etc.), time stamps, permission information, and pointers to the data blocks of the file. Because our file system supports hard links, one inode may be referred to by more than one name -- which is why the inode itself does not store the file "name". Instead, directory entries give names to inodes (as noted earlier). The i_direct array in struct inode contains space to store the block numbers of the first 10 (N_DIRECT) blocks of the file, which we will call the file's direct blocks. For small files up to 10*4096 = 40KB in size, this means that the block numbers of all of the file's blocks will fit directly within the inode structure itself. For larger files, however, we need a place to hold the rest of the file's block numbers. For any file greater than 40KB, an additional disk block, called the file's indirect block, holds up to 4096/4 = 1024 additional block numbers, pushing the maximum file size up to (10 + 1024)*4096 = 4136KB, or a little over 4MB. The file system also supports double-indirect blocks. A double-indirect block (i_double in the inode structure) stores 4096/4 = 1024 additional indirect block numbers, which themselves each store 1024 additional direct block numbers. This affords an additional 1024*1024*4096 = 4GB worth of data, bringing the maximum file size to a little over 4GB, in theory. To support even larger files, real-world file systems typically support triple-indirect blocks (and sometimes beyond). Other Features Our file system supports all the traditional UNIX notions of file ownership, permissions, hard and symbolic links, time stamps, and special device files. Perhaps surprisingly, much of this functionality will come for free (or very low cost) after writing just a small number of core file system operations. Some of the ease in supporting these traditional UNIX file system notions comes from FUSE. Goal You will implement some components of the FUSE driver (and hence the file system): allocating disk blocks, mapping file offsets to disk blocks, and freeing disk blocks allocated in inodes. In order to do this, you will have to familiarize yourself with the provided code and the various file system interfaces. Source files · bitmap.c, bitmap.h: operations for manipulating the free disk block bitmap. · dir.c, dir.h: operations for manipulating directories, including adding entries to directories and walking the directory structure on-disk to access a file. · fs_types.h: contains structure and macro definitions relevant to the layout of the file system. · inode.c, inode.h: operations for reading and writing data to inodes on-disk. · disk_map.c, disk_map.h: contains the flush_block() and diskblock2memaddr() functions, both of which are vital for the functions that you will write. · fsdriver.c: the main source file for the file system driver. · fsformat.c: the main source file for the file system formatting utility. The main file system code that we've provided for you resides in fsdriver.c. This file contains all the FUSE callbacks to handle I/O syscalls, as well as the main function. Once the path to the disk image and the path to the mount point (testfs.img and mnt respectively in the supplied example) have been read from the command line arguments, our FUSE driver uses mmap() to map the specified disk image into memory. This happens in the map_disk_image function (defined in disk_map.c), which itself initializes some file system metadata. Then, fsdriver.c calls fuse_main, which handles kernel dispatches to our registered callbacks. These callbacks will invoke the functions that you write in the coming exercises. As stated in class, mmap reserves a portion of the running process's virtual address space to provide read and write access to a file as if that file were an array in memory. For example, if file is a pointer to the first byte of a memory-mapped file, writing ((char *)file)[5] = 3 is approximately equivalent to the two calls lseek(fd, 5, SEEK_SET) and then write(fd, , 1). To flush any in-memory changes you've made to a file onto the disk, you would use the msync function. As always, you can check the man pages for more information on these syscalls. For this lab, you will not need to be intimately familiar with their operation, but you should have a high-level understanding of what they do. Heads up: some key tips for the lab. · It’s virtually impossible to do this lab correctly without reading the supplied code and specs carefully. This includes comments, programming idioms in the supplied code, specs of functions that you are supposed to call, specs of functions that you are supposed to write, and more. · Expanding on the prior point: if you’re not sure how to do something, look around in the same file for how other functions are implemented in the supplied code; this will give you programming hints. · As usual, bugs in earlier exercises may show up only in later exercises, or in later grading tests. · In computing, it’s generally impossible to validate something completely with a fixed set of tests; here too, it is possible to pass the tests with wrong logic that is later penalized by hand grading. Thus, you will want to make sure not only that you are passing the tests but also that the logic is sensible. · You will very likely need to use the debugger (gdb). We include instructions below on running the driver under gdb; for gdb-specific commands, such as breakpoint-setting, please see lab 1. The work Exercise 1. Before coding in the driver, cd to the lab5 directory and run ./chmod-walk in the Lab 5 directory. This will set up permissions in the directories leading up to your lab directory correctly so that you will be able to run the driver successfully. (If you do not run this script, FUSE will be unable to mount or unmount file systems in your lab directory.) When you run the FUSE driver ($ build/fsdriver testfs.img mnt), the function map_disk_image will set the bitmap pointer. After this, we can treat bitmap as a packed array of bits, one for each block on the disk. See, for example, diskblock_is_free, which simply checks whether a given block is marked free in the bitmap. Exercise 2. Implement alloc_diskblock in bitmap.c. It should find a free disk block in the bitmap, mark it used, and return the number of that block. When you allocate a block, you should immediately flush the changed bitmap block to disk with flush_block, to help file system consistency. Under the hood, flush_block calls msync to schedule the disk block in memory to get flushed to the actual disk image. Note: You should use free_diskblock as a model. Use make grade to test your code; run it as sudo like this: $ sudo make grade Your code should now pass the "alloc_diskblock" test. Debugging The output of our make grade will usually not provide you with enough information to debug a problem. Here are some debugging guidelines. Debugging is crucial in this lab. Or, put differently: it would be very surprising if make grade gives all of the points on your first attempt. And when it does not give you all of the points, you will need to debug. This section tells you how. First, when the grading script. fails, look at the scripts in test/ to see what is actually happening. Some of the tests are in the fs_test() function in fsdriver.c. Others are invoked by separate C programs. You will have to identify which test or test program is causing the failure. This is a prior step to using gdb and requires doing some detective work on the grading script. and the supplied code. (This sort of detective work is a necessary skill when creating or contributing to any non-trivial software project.) Second, note that “Transport endpoint is not connected” and “Software caused connection abort” are errors that user programs see when the file system driver panics or otherwise crashes and is no longer handling system calls (open(), read(), etc.) for that mount point. So, when a program starts producing these errors (for example, from a grading script), you will probably need to run the driver in gdb to see where it’s panicking. The rest of this section gives building blocks and a detailed howto for gdb. The howto assumes that you have read and absorbed the building blocks. Basic approach and building blocks · The approach to debugging has three high level steps. (1) Set up the file system (the grading scripts do this for you but when debugging you need to do it semi-manually), (2) Run the file system driver (either standalone or in gdb) that invokes your file system, and (3) Run a program that actually interacts with the file system (again, either standalone or in gdb). Each of these steps requires specific commands from you; we delve into these commands now. · For step (1), setting up the file system semi-manually, a relatively straightforward to do this is to run: $ test/testbasic.bash This script. creates a disk image, testfs.img, which is set up properly for further internal tests in the driver. It also runs some basic tests on the driver (specifically: $ build/fsdriver testfs.img mnt --test-ops which you might find useful to run on its own.) You can also delve into the bash scripts and try to reproduce the kinds of commands that are in, say, test/teststress.bash. Be aware that these shell scripts call shell library functions that are defined in test/libtest.bash. · For steps (2) and (3), you will typically need to run these side-by-side, literally. To do this, you will want to use the tmux program. Therefore, if you haven’t already done so for lab 4, take 10 minutes right now to do the tmux tutorial. · For step (2), running the file system driver, you will want to run using the -d flag. The standalone version is as follows (later on in this description we will show it for gdb). $ build/fsdriver -d testfs.img mnt or you may need to preface this command with sudo: $ sudo build/fsdriver -d testfs.img mnt This command runs the driver in debugging mode and mounts testfs.img at mnt. In debugging mode, the driver does not exit until you press Ctrl-C. This means that you cannot interact with the file system via the terminal containing the command above; instead, you will need to interact with the file system from another terminal (again, using tmux). While in debugging mode, fsdriver will print to the console a trace of the syscalls dispatched to the driver; any printfs that you insert in the driver will also be displayed. Once you run the command above, you should see something like: FUSE library version: 2.9.2 nullpath_ok: 0 nopath: 0 utime_omit_ok: 0 unique: 1, opcode: INIT (26), nodeid: 0, insize: 56, pid: 0 INIT: 7.23 flags=0x0003f7fb max_readahead=0x00020000 INIT: 7.19 flags=0x00000011 max_readahead=0x00020000 max_write=0x00020000 max_background=0 congestion_threshold=0 unique: 1, success, outsize: 40 ... · For step (3), you will use another terminal (again, using tmux). Something like: $ ls mnt will cause the original terminal to print germane output. · To actually debug, you can use printf or gdb. Any printfs that you add will be displayed in the terminal with fsdriver, as stated above. However, gdb is likely to be more effective. See below for instructions. Running the driver in gdb: · We assume that you have already done step (1) above. This section is about how to run steps (2) and (3) under the debugger. · If you haven’t already done so for lab 4, take 10 minutes right now to do the tmux tutorial. · Then look over how we use gdb with tmux in lab 4. · In this case, do the following, recalling that C-b % means “type Ctrl-b together, let go, and then type the % key”: · $ tmux C-b % · At this point, you should have two side-by-side panes, with the active one being the one on the right. Go back to the one on the left with C-b (which means “Ctrl-b together, let go, and then type the left arrow key”), and type the following. You might again need to preface the command with sudo: · $ gdb build/fsdriver · (gdb) run -d testfs.img mnt In this way, you will be running build/fsdriver -d testfs.img mnt or sudo build/fsdriver -d testfs.img mnt but in the debugger. · Then back to the right-hand pane with C-b . In that terminal: · # manually run anything driven from the test/ directory, · # possibly under gdb. For example: · · $ build/posixio · · # or · · $ gdb build/stressfs · (gdb) run Cleaning up. During the course of testing your FUSE driver, various combinations of operations may cause your FUSE driver to stop functioning or enter a non-clean state. It may be helpful to search Campuswire for information about specific error messages. More generally, to get your system to a clean starting state, you can do the following (you may need to preface some of these commands with sudo) $ fusermount -u mnt # unmount the driver If that does not work, then force it with: $ sudo umount -f mnt and possibly: $ sudo umount -f /lab5mnt Then continue: $ echo 'sample' > build/msg # create a message $ test/makeimage.bash # make a clean testfs.img $ rm -f mnt # remove the symlink $ sudo rm -rf /lab5mnt # remove the mounting directory and its residents $ sudo mkdir /lab5mnt # recreate the mounting directory $ ln -s /lab5mnt mnt # recreate the symlink You may wish to encapsulate these actions in a shell script. Once you have executed the above lines, you start the driver again with the following, possibly prepended with sudo: $ build/fsdriver -d testfs.img mnt File Operations We have provided various functions in dir.c and inode.c to implement the basic facilities you will need to interpret and manage inode structures, scan and manage the entries of directories, and walk the file system from the root to resolve an absolute pathname. Read through all of the code in these files and make sure you understand what each function does before proceeding. Exercise 3. Implement inode_block_walk and inode_get_block in inode.c. These are the workhorses of the file system. For example, inode_read and inode_write aren’t much more than the bookkeeping atop inode_get_block necessary to copy bytes between scattered blocks and a sequential buffer. Their signatures are: int inode_block_walk(struct inode *ino, uint32_t filebno, uint32_t **ppdiskbno, bool alloc); int inode_get_block(struct inode *ino, uint32_t filebno, char **blk); inode_block_walk has similar logic to the virtual memory lookup function in lab4. It finds the disk block number slot for the 'filebno'th block in inode 'ino', and sets '*ppdiskbno' to point to that slot. inode_get_block goes one step further and sets *blk to the start of the block, such that by using *blk, we can access the contents of the block. It also allocates a new block if necessary. The pointers-to-pointers may be confusing. It’s best to draw pictures, or look at other code in inode.c, or think about how to call the function. Also, as a reminder: in C, when we want to return multiple values from a function (or return one value, in a function returning an int status code), we set up the function to take a pointer (address) as a parameter, and we put the return value in the supplied address (by dereferencing the pointer). As an example, if we have a function f(int* p), the implementation of f can return an int to the caller by storing into (dereferencing) the supplied address, with a line like *p = 5. This identical pattern holds when the return value is itself a pointer. In our example, the return value is an uint32_t* (an address whose contents will be a 32-bit block number). So, the caller passes storage for that uint32_t*: an uint32_t**. Use make grade to test your code (again, by invoking: $ sudo make grade). Your code should pass the "inode_open" and "inode_get_block" tests. After Exercise 3, you should be able to read and write to the file system. Try something like $ echo "hello" > "mnt/world"; cat "mnt/world" Exercise 4. Implement inode_truncate_blocks in inode.c. inode_truncate_blocks frees data and metadata blocks that an inode allocated but no longer needs. This function is used, for instance, when an inode is deleted; the space reserved by the inode must be freed so that other files can be created on the system. Use make grade to test your code ($ sudo make grade). Your code should pass the "inode_flush/inode_truncate/file rewrite" tests. Exercise 5. Implement inode_link and inode_unlink in inode.c. inode_link links an inode referenced by one path to another location, and inode_unlink removes a reference to an inode at a specified path. Make sure that you properly increment the link count in an inode when linking and decrement the link count when unlinking. Don't forget to free an inode when its link count reaches zero! inode_link and inode_unlink allow us to exploit the level of indirection provided by using inodes in our file system (as opposed to storing all file meta-data inside of directories, for instance) and manage referencing inodes with multiple names. The inode_unlink operation is particularly important as it allows us to release the space reserved for an inode, acting as a "remove" operation when an inode's link count is one. Use make grade to test your code ($ sudo make grade). Your code should pass the "inode_link/inode_unlink" tests. After Exercise 5, you should be able to make hard links. Try something like $ echo "hello" > "mnt/world"; ln "mnt/world" "mnt/hello"; rm "mnt/world"; cat "mnt/hello" The tests after "inode_link/inode_unlink" are all effectively stress tests, in some way or another, for the driver. Each of them relies on the core functionality that you implemented; some can fail if you didn't handle certain edge cases correctly. If you fail one of these tests, go back and check the logic in your code to make sure you didn't miss anything. Exercise 6. Run your ls from lab 2 against the file system in /lab5mnt. Paste the output of $ /path/to/your/ls -alR /lab5mnt into answers.txt. If you have a non-working lab 2, then just note that in the answers.txt file. You can have fun – not graded – using the fact that you are writing both ls and implementing the file system. For example, you could consider having your file system stuff coded messages in extraneous dirents, and then interpret/decode them in ls. Extra credit questions Do either of the following for extra credit (you can do both, but extra credit is given for only one). As in lab4, the points given will not be commensurate with effort required. Exercise 7. The file system is likely to be corrupted if it gets interrupted in the middle of an operation (for example, by a crash or a reboot). Implement soft updates or journalling to make the file system crash-resilient and demonstrate some situation where the old file system would get corrupted, but yours doesn't. Exercise 8. Currently, our file system allocates one block (4096 bytes) per inode. However, each struct inode only takes up 98 bytes. If we were clever with file system design, we could store 4096/98 = 41 inodes in every block. Modify the file system so that inodes are stored more compactly on disk. You may want to make the file system more like a traditional UNIX file system by splitting up the disk into inode and data regions, so that it is easier to reference inodes by an index (generally called an "inum" for "inode number") into the inode region. Further questions Answer the following questions in answers.txt. 1. How long approximately did it take you to do this lab? 2. Do you feel like you gained an understanding of how to build a file system in this lab? Please suggest improvements. Submission Handing in consists of three steps: 1. Executing this checklist: o Make sure your code builds, with no compiler warnings. o Make sure you’ve used git add to add any files that you’ve created. o Fill out the top of the answers.txt file, including your name and NYU Id o Make sure you’ve answered every question in answers.txt o Make sure you have answered all code exercises in the files. o Create a file called slack.txt noting how many slack days you have used for this assignment. (This is to help us agree on the number that you have used.) Include this file even if you didn’t use any slack days. o git add and commit the slack.txt file 2. Push your code to GitHub, so we have it (from outside the container or, if on Mac, this will also work from within the container): 3. $ cd ~/cs202/lab5 4. $ make clean 5. $ git commit -am "hand in lab5" 6. $ git push origin 7. 8. Counting objects: ... 9. .... 10. To [email protected]:nyu-cs202/labs-24fa-.git 7337116..ceed758 main -> main 11. Actually submit, by timestamping and identifying your pushed code: o Decide which git commit you want us to grade, and copy its id (you will paste it in the next sub-step). A commit id is a 40-character hexadecimal string. Usually the commit id that you want will be the one that you created last. The easiest way to obtain the commit id for the last commit is by running the command git log -1 --format=oneline. This prints both the commit id and the initial line of the commit message. If you want to submit a previous commit, there are multiple ways to get the commit id for an earlier commit. One way is to use the tool gitk. Another is git log -p, as explained here, or git show. o Now go to NYU Brightspace; there will be an entry for this lab. Paste only the commit id that you just copied. o You can submit as many times as you want; we will grade the last commit id submitted to Brightspace. NOTE: Ground truth is what and when you submitted to Brightspace. Thus, a non-existent commit id in Brightspace means that you have not submitted the lab, regardless of what you have pushed to GitHub. And, the time of your submission for the purposes of tracking lateness is the time when you upload the id to Brightspace, not the time when you executed git commit. This completes the lab. Acknowledgements The diagram explaining FUSE is adapted from the diagram displayed on FUSE's homepage. This lab is an edited version of a lab written by Isami Romanowski. (He in turn adapted code from MIT's JOS, porting it to the FUSE and Linux environment, adding inodes and more.)
ITP-216 Applied Python Final Project Definition/Description In the era of Big Data, data analysis and visualization is the best approach for extracting useful information and making decisions. For your Final Project, you are tasked with creating a web app which manipulates and visualizes a Big Data dataset. Requirements General Your web app shall be written in Python using Flask, pandas, scikit-learn, and matplotlib. Your web app shall allow clients to choose what subsets of the data they would like to see (via text input, radio buttons, et al.), and the app will serve them a visualization of that data. Big Data You may use any dataset of your choice for your Big Data; follow your heart! It must, however, meet the following criteria: 1. It must contain at least 1000 datapoints. 2. At least half of the data must be numeric. Your Big Data shall be stored on your server. It may be stored as a csv. For extra credit, your Big Data shall be stored in a database which your web app accesses. Endpoints Your web app should implement a number of endpoints. 1. At least 2 endpoints used for GET requests,i.e. directly accessible by browsers (e.g. '/'). a. At least 1 of these should be a dynamic endpoint created from a client POST request (e.g. for a web app which makes predictions on amounts of insects in a given area, '/projection/butterflies' ). 2. At least 2 endpoints used for POST requests,i.e. not directly accessible by browsers (e.g. '/login'from the Web App Homework). Scientific Computation Your web app shall do some sort of computation with the Big Data dataset. This could be as straightforward as aggregating attributes, but it needs to compute something meaningful. Machine Learning Your web app shall make predictions based on the given Big Data dataset. As long as your web app is using ML to make predictions, you're good to go. These could be: 1. Predictions of how a particular property will change over time. 2. Label classifications of data with unknown labels. Data Visualization Your web app shall visualize the Big Data dataset in some meaningful way. At least two types of plots should be accessible: 1. At least 1 plot should visualize the data without any ML processing. 2. At least 1 plot should visualize ML-processed predictions. These plots do not all have to appear on the same webpage. Provided Files/Data Example project An example of the Final Project from last semester can be found here: http://pohlner.pythonanywhere.com/ This project visualizes COVID-19 confirmed and recovered cases, and also visualizes ML predictions of future confirmed and recovered cases. It uses datasets from The Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE), but only uses data up to mid-October. *Example Project Usage Notes: . Make sure you’reusing Chrome or Firefox (not Safari or IE) . For Locale try “Mexico” (case-sensitive) . For Date try “02/02/23” (format matters) Reference material 1. Flask documentation: https://flask.palletsprojects.com/en/1.1.x/ 2. Jinja documentation: https://jinja.palletsprojects.com/en/2.11.x/ 3. matplotlib documentation: https://matplotlib.org/contents.html 4. Scikit-learn documentation: https://scikit-learn.org/stable/modules/classes.html 5. Description of categories of machine learning models and different algorithms: https://towardsdatascience.com/all-machine-learning-models-explained-in-6-minutes-9fe30ff6776a 6. Matplotlib with Flask: https://stackoverflow.com/questions/20107414/passing-a-matplotlib-figure-to-html-flask 7. https://stackoverflow.com/questions/65068073/error-while-showing-matplotlib-figure-in-flask a. https://stackoverflow.com/a/65068732 Big Data sources You may use any dataset you'd prefer, as long as it meets the criteria for the Final Project. Below is a short, non exhaustive list of Big Data dataset options: 1. Start here: Kaggle Open Source Data Sets for ML: https://www.kaggle.com/datasets 2. 80+ Free Data Sets: https://www.interviewquery.com/p/free-datasets 3. Forbes list: https://www.forbes.com/sites/bernardmarr/2018/02/26/big-data-and-ai-30-amazing-and-free-public- data-sources-for-2018/?sh=5f4a369f5f8a 4. Springboard list: https://www.springboard.com/blog/free-public-data-sets-data-science-project/ 5. Data from the City of Austin, TX: https://data.austintexas.gov/ Deliverables All Python files, in the same directory, and compressed in a zip file. The zip file should be named: ITP_216_FP_YourLastName_YourFirstName.zip Grading Section Points (Total: 30) Functionality and User Interface 1. The root endpointshall display an input section for client-supplied data. 2. Clients shall be able to select a query of existing data, or a prediction based on existing data. 3. Clients shall be able to submit query information, which will generate a POST request. 4. Clients shall be able to see queried information on returned pages. 4 (1 point each) General Code 1. The codeshall contain no global objects other than those provided. 2. View functions shall only contain code related to the view function itself; anything else (e.g. querying the database, constructing a pandas object, et al.) shall be separated and held in its own function. 2 (1 point each) Web App 1. Flask shall be used to create the web framework routing using endpoints and associated view functions. 2. The web app shall query and manipulate the dataset. 3. The web app shall contain at least 2 GET endpoints. (2 points) a. At least 1 of these shall be a dynamic endpoint created by a client POST request. 4. The web app shall contain at least 2 POST endpoints. (2 points) 6 (1 point each) Scientific Computation 1. Pandas shall be used for manipulation of the data. 2. The app shall calculate some sort of meaningful aggregation. 2 (1 point each) Machine Learning 1. Scikit-learnshall be used for the Machine-Learning aspects. 2. A projectionshall be created of at least one of the features of the dataset. 2 (1 point each) Data Visualization 1. Matplotlib shall be used to visualize the data once it has been loaded, prepared, and manipulated. 2. Figures shall have a title. 3. Plot axes shall be labelled. 4. Plots shall contain a legend, and datasets plotted shall be named in the legend. 5. Plot axes shall have values clearly visible (numbers, names, et al.). 6. A distinctionshall be made for any plot as to whether it represents existing data or predictive data. 6 (1 point each) Documentation and Formatting 1. Concise and useful commenting in your codebase is a must. You will need a header with your name, the semester, the section of the course you are in, and the assignment number. 2. You need descriptions of any major sections in your code (functions, classes, methods, et al.). 3. Your code must be generally clear and readable. 3 (1 point each) Error Handling 1. The web app shall run with no errors. 2. The web app shall reroute appropriately when given a nonsensical request (e.g. an endpoint that a client isn't meant to request directly, a POST with the wrong data, et al.) 2 (1 point each) Extra points for free! 3 Extra Credit (database) 1. All data is stored in the database accurately (i.e. appropriate tables, key relationships [if any], attributes, and constraints on attributes). 2. The database is queried correctly given the client input, and returns appropriate data. 5 (2.5 points each)
Assessment (non-exam) Brief Module code/name INST0007/Web Technologies Academic year 2024/25 Term 1 Assessment title Coursework: critical report of the developed website (with pre-requisite participation elements to qualify for self-assessment). Individual/group assessment Individual Submission deadlines: Students should submit all work by the published deadline date and time. Students experiencing sudden or unexpected events beyond your control which impact your ability to complete assessed work by the set deadlines may request mitigation via the extenuating circumstances procedure. Students with disabilities or ongoing, long-term conditions should explore a Summary of Reasonable Adjustments. Students may use the delayed assessment scheme for pre-determined mitigation on a limited number of assessments in a year. Check the Delayed Assessment Scheme area on Portico to see if this assessment is eligible. Return and status of marked assessments: Students should expect to receive feedback within 20 working days of the submission deadline, as per UCL guidelines. The module team will update you if there are delays through unforeseen circumstances (e.g. ill health). All results when first published are provisional until confirmed by the Examination Board. Copyright Note to students: Copyright of this assessment brief is with UCL and the module leader(s) named above. If this brief draws upon work by third parties (e.g. Case Study publishers) such third parties also hold copyright. It must not be copied, reproduced, transferred, distributed, leased, licensed or shared with any other individual(s) and/or organisations, including web-based organisations, without permission of the copyright holder(s) at any point in time. Referencing: You must reference and provide full citation for ALL sources used, including AI sources, articles, text books, lecture slides and module materials. This includes any direct quotes and paraphrased text. If in doubt, reference it. If you need further guidance on referencing please see UCL’s referencing tutorial for students. Failure to cite references correctly may result in your work being referred to the Academic Misconduct Panel. Use of Artificial Intelligence (AI) Tools in your Assessment: Your module leader will explain to you if and how AI tools can be used to support your assessment. In some assessments, the use of generative AI is not permitted at all. In others, AI may be used in an assistive role which means students are permitted to use AI tools to support the development of specific skills required for the assessment as specified by the module leader. In others, the use of AI tools may be an integral component of the assessment; in these cases the assessment will provide an opportunity to demonstrate effective and responsible use of AI. See page 3 of this brief to check which category use of AI falls into for this assessment. Students should refer to the UCL guidance on acknowledging use of AI and referencing AI. Failure to correctly acknowledge use of AI in assessments may result in students being reported via the Academic Misconduct procedure. Refer to the section of the UCL Assessment success guide on Engaging with AI in your education and assessment. Content of this assessment brief Section Content A Core information B Coursework brief and requirements C Module learning outcomes covered in this assessment D Groupwork instructions (if applicable) E How your work is assessed F Additional information - Appendix 1 - Appendix 2 Section B: Assessment Brief and Requirements Task 1. Designing and Developing the Website As part of this assessment, you are required to design, develop, and deploy a small website (or a cohesive portfolio of web pages) to showcase the knowledge and skills you have acquired over the course of this module. Your website will be part of an exhibition available to all students enrolled on the module to view and evaluate. You are strongly advised to develop a web-based CV as your portfolio. However, you may choose to develop a website showcasing personal interests, voluntary work, social/political work, or notes around academic research or scholarship. You must design and develop the website by taking into consideration user experience, accessibility, responsiveness, and relevant performance metrics (e.g. PageSpeed Insights). You may want to apply other user/automated testing methods (e.g. WAVE from WebAIM) to further improve your work prior submission. User-Centred Design and User Experience should remain focal for the developed website. You should demonstrate the stages of the design process, through wireframes and references to best practice. Responsive design and accessibility considerations should also be evident in the final submission. You are allowed to use open-source templates, or your own code developed as part of your Tutorial Sheet work, however, you should change the code and acknowledge the original author (on a separate References HTML page and in the accompanied report). Changing third party code should not be limited to content, but should include HTML and CSS changes. This submission should demonstrate your design and development skills. You have the freedom to choose the scope and structure of your personal website. There are, however, a set of required elements your website should contain. These are: · have three or more HTML pages, · have one of the pages dedicated to referencing external code used in the website, · have a navigation element consistently displayed throughout the website (i.e. all pages), · have HTML form. elements, · have consistency in look and feel across pages, · have the website published on UCL Personal Webpages server. NOTE: Please refer to the assessment criteria prior to starting the work on your website. You are advised to keep a record of your learning achievements (related to technical skills or broader understanding) throughout the duration of your work, to help you remember the details at time of reflection and self-assessment. This task is considered completed, when you: · Submit code to your final website on Moodle by the given deadline. Task 2. Hosting the website Prior to starting the work you should familiarise yourself with the web hosting available to UCL students commonly referred to as UCL Personal Webpages ( https://www.ucl.ac.uk/isd/services/websites-apps/personal-webpages) and the teaching material provided on Moodle. An example of a website available on the hosting is available here: https://www.ucl.ac.uk/~uczckst/inst0007/week03/exercise03/wk03-exercise03.html You will only be able to upload your website on the UCL Personal Webpages from UCL-managed machines (on campus), via Desktop@UCL Anywhere or using UCL’s Citrix Workspace. NOTE: You should remember that the UCL Personal pages are available to everyone on the web. You should remember not to add any content that may not be suitable for publishing widely and if necessary remove your website once the assessment mark is confirmed. This task is considered completed, when you: · Provide the URL to your deployed website to be included for an exhibition. Task 3: Self-assessment report Critically assess your own website against the given criteria and a good example of a peer’s website submitted for the exhibition. Your critical assessment must be submitted as a report using the form. provided (see Appendix 1). Your report should highlight the strengths and weaknesses of your website, including the design, development process, and evaluation as seen in the assessment criteria. Your assessment should also include a reflective assessment of your learning journey. When writing your critical self-assessment you should include reflections around: · the key stages of the project and justifications for decisions, such as user-centred design, and user experience; · the use of specific tools and technologies, such as prototyping to wireframing; · application of user-testing or usability evaluation method; · justifications for choosing specific user-testing or usability methods; · validity of the HTML, CSS, and JavaScript. code; · accessibility at design and deployment stages; · website performance such as Google PageSpeed Insights tests; · the key learning outcomes from working on the project; · key challenges that helped you learn; · key limitations and the future work that remained beyond the scope of the project. Along with critical assessment you should allocate a mark that reflects your assessment for each of the given criteria and in accordance to the given rubric. The self-awarded marks for individual criteria will form. your final mark for the module if you satisfy the pre-requisites for self-assessment and if the mark is not revised by tutor. Reports, which contain marks that do not reflect assessment with the given rubric, will be adjusted by the tutor and lead to revision of the final mark to reflect the lack of critical self-assessment. This task is considered completed, when you: · Submit your self-assessment report on Moodle by the given deadline. Notes on Marking Where pre-requisites are not met, the marks are guaranteed to be revised by the tutor. Additionally, any mark above 74 will also be subject to review and adjustment by the tutor. Where self-assessment criteria and the corresponding mark appear to be applied inaccurately, the marks will be adjusted to better reflect the performance on assessing your work critically and assigning a mark in line with the rubric and critical thought. Note: Module leader reserves the right to review and adjust ANY of the marks based on the submitted self-assessment report. Students should follow the assessment rubric closely to reduce the likelihood of mark adjustments by tutor. How to qualify for taking part in self-assessment and reduce the changes of your mark being adjusted by the tutor? All pre-requisites, as listed in Section A will need to be completed by the provided deadlines. The following chart (Figure 1) will help you navigate this assessment and minimise the risk of having your mark adjusted by the tutor. Figure 1: Navigating assessment and minimising reducing the likelihood the risk of having your mark adjusted by the tutor. Section C: Module Learning Outcomes covered in this Assignment This assignment contributes towards the achievement of the following stated module Learning Outcomes as highlighted below: · understand the basic principles of website design and development; · familiarised with technologies and related tools for prototyping, mark-up, and scripting; · understand concepts and develop skills related to user experience and accessibility; · understand concepts related to good practices of developing and evaluating websites.