Assignment Chef icon Assignment Chef

Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Factorial Design at 2 Levels R

Prompt Instructions For each question in this assignment, show the steps of your calculation in R . (Please only use R to check your results.) This assignment will be graded out of 100 points, where 100 / 100 = 100%. Teams may talk to each other about how to approach the problems, but we expect each team to do their own work. Please submit a .docx file listing your R code and responses, according to this format: 1: All lines of code must be clearly explained. 2: All analyses must be performed in R . I highly recommend collaborating in Google Docs, but you must submit a .docx file at the end. (Google Doc share links will not be accepted.) (All values below are hypothetical, unless otherwise indicated, for educational purposes, but are meant to be plausible examples.) Be sure to review the Background and Measures tabs above before starting! Background Jolly Cobblers, a family-owned North Pole shoe store, has been selling quality shoes to Santa and his elves for generations. Unfortunately, rising market competition has forced Jolly Cobblers to re-evaluate their product design, because after a night delivering presents, Santa’s boots face a lot of wear and tear; the backstay of the heel always breaks by the end of the night. Their customers need a sturdier boot, ASAP - otherwise Santa will just buy boots on Amazon instead. So, the elves at Jolly Cobblers designed and implemented a factorial experiment to develop a more durable boot! Measures They measured the following outcome: boot durability: the number of times an elf can hammer a boot heel with the weight of Santa Claus before the heel breaks. For example, click here to see this Heel Stomper! Then, they made a series of 240 boots, randomly assigning groups of 15 boots to a particular amount of leather , layers , foam thickness, and waterproofing solvent concentration. All other aspects of the production process were held constant. leather: thickness of the leather, in millimeters ( 1.0 vs. 1.5 millimeters). layers: layers of leather ( 2 vs. 3 ) foam: thickness of foam padding on the backstay ( 3.0 vs. 4.0 millimeters). solvent: concentration of a water-proofing solvent ( 0.25 = 25% solvent with 75% water; 0.50 = equal parts solvent and water). Santa needs some new boots! Photo by Erik Mclean on Unsplash (https://unsplash.com/photos/4Q0QynPM8UA) Data & Packages # Load packages library(tidyverse) library(broom) library(viridis) library(metR) # Questions 1-2 uses this dataset, 'boots.csv' boots = read_csv("workshops/boots.csv") # Questions 3-4 uses this dataset, 'allboots.csv" allboots = read_csv("workshops/allboots.csv") # Let's view the boots dataset boots %>% glimpse() ## Rows: 240 ## Columns: 7 ## $ id 1, 2, 3, 4, 5, 6, 7, 8, 9, 1… ## $ group 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ durability 44, 123, 52, 62, 65, 82, 93,… ## $ thickness 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ layers 2, 2, 2, 2, 2, 2, 2, 2, 2, 2… ## $ foam 3, 3, 3, 3, 3, 3, 3, 3, 3, 3… ## $ solvent 0.25, 0.25, 0.25, 0.25, 0.25… Staff at Jolly Cobblers are eager to use Six Sigma! Photo by Karsten Winegeart on Unsplash (https://unsplash.com/s/photos/santa? utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) Q1. Factorial Design at 2 Levels (25 points) Using the boots.csv dataset, create a table of contrasts estimating the direct/main effects and interaction effects (15 effects total). Use the difference of means in R (not a regression model). This table should contain 15 rows and 5 columns, with columns for the (1) contrast , (2) estimated effect , (3) standard error ( se ), and (4) lower and (5) upper confidence intervals. Follow the instructions below to create this table, showing your work in R . Hint: See Workshop 14 a. (5 points) Estimate the standard error for this factorial experiment’s differences of means. Hint: See Workshop 14.2 b. (5 points) Estimate the direct/main effects of each treatment (4 total). Hint: See Workshop 14.1 c. (10 points) Estimate all interaction effects of every possible unique combination of treatments (11 total). Hint: See Workshop 14.5 d. (5 points) Use the data from parts a, b, and c to estimate upper and lower confidence intervals for each effect from parts b & c, with a 95% confidence level. Report the finished table, presenting results rounded to 1 decimal place. Hint 1: (Optional) bind_cols() may help you combine data.frames . Hint 2: See Workshop 14.4 Q2. Factorial Experiment Quantities of Interest (15 points) a. (5 points) Using only the information gathered in Q1 part d, how would you determine which of the estimated effects are likely to be real, significant effects and not just due to sampling error? [Handwritten OK] Hint: Look at the confidence intervals. Discussed in Lectures for Workshop 14, among others. b. (10 points) Which of the estimated effects are likely to be the real effects and not just due to chance? Describe each significant effects in a sentence or more, listing (1) the contrast levels in their original units, (2) the estimated effect in units of times the boot heel is crushed, and (3) the confidence interval. [Handwritten OK] Hint: Look at the confidence intervals. Discussed in Lectures for Workshop 14, among others. BONUS: (+5 points) The present conditions of operation are the lower settings for each contrast ( thickness = 1 mm, layers = 2, foam = 3, & solvent = 0.25). Some of these potential changes cost more than others! Use the information below to deduce which of your significant effects is most cost effective. For each boot, it costs… $5 dollars to improve the thickness of the leather from 1 mm to 1.5mm, $2 to increase the layers from 2 to 3, $1 to increase the foam padding from 3 mm to 4 mm, and… $2.5 to increase the concentration of waterproofing solvent from 25% to 50%. Hint: Use the costs above to calculate the added-cost of each treatment effect (eg. the cost of switching solvent from 25% to 50% AND layers from 2 to 3 would be $2 + $2.50 = $4.50 ). Then, use your estimate to calculate the added-durability of a boot in times crushed per dollar spent! (Round your payoff to 1 decimal place.) Hint: Don’t over-think it. It’s multiplication and division. Q3. Interaction Models & RSM (25 points) Enthused by their promising findings, the elves added a few more contrast levels to their existing treatment variables ( thickness = 1, 1.5, & 2 mm, layers = 2, 3, & 4, foam = 3, 3.5, 4, and solvent = 0.25 , 0.375 , & 0.5 ). Their data is available is allboots.csv (it’s very high tech at the North Pole). # Import data for Questions 3-4 allboots = read_csv("workshops/allboots.csv") # View it! allboots %>% glimpse() ## Rows: 1,215 ## Columns: 7 ## $ id 1, 2, 3, 4, 5, 6, 7, 8, 9, 1… ## $ group 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ durability 44, 123, 52, 62, 65, 82, 93,… ## $ thickness 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ layers 2, 2, 2, 2, 2, 2, 2, 2, 2, 2… ## $ foam 3, 3, 3, 3, 3, 3, 3, 3, 3, 3… ## $ solvent 0.250, 0.250, 0.250, 0.250, … a. (10 points) Estimate durability using a second-order polynomial model in lm() with all interaction effects, called m1 . Write out your entire model equation. (Example: Predicted Durability = 2.5 + 3 x thickness... etc.) (You may round coefficients to the first decimal place.) Hint: See Workshop 15.1 b. (5 points) Your outcome is a count bounded at zero. Make 2 more models m2 and m3 , one of which models the square root of your outcome and one of which models the natural log of your outcome. 1. Report the , F statistic, and p-value for each model. (Round your your answers to 3 decimal places.) R2 2. Which of your 3 models fits best? How do you know? Hint: See Workshop 15.1.3 c. (10 points) Using the best fitting model you selected in part b, generate a grid of predictions using all specs below in requirements (1), (2), and (3). Visualize that grid as a contour plot with labels in ggplot! (Do not use contour() .) 1. Vary the thickness from 1 to 4 by 0.1 mm each and number of layers from 1 to 4 by 1 layer each! 2. Hold the level of foam at their real current values of 3 mm and solvent at 25% . (Be sure to back-transform. your predictions if your outcome is transformed.) 3. Add clear labels, themes, and a visually appealing color palette (not the default blue). Hint: See Workshop 15.2 Q4. RSM Quantities of Interest (35 points) The elves are excited about their promising results, but they really need to refine them into usable quantities of interest for decision-making. Answer the questions below to help them analyze their results. # Import data for Questions 3-4 allboots = read_csv("workshops/allboots.csv") # View it! allboots %>% glimpse() ## Rows: 1,215 ## Columns: 7 ## $ id 1, 2, 3, 4, 5, 6, 7, 8, 9, 1… ## $ group 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ durability 44, 123, 52, 62, 65, 82, 93,… ## $ thickness 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ## $ layers 2, 2, 2, 2, 2, 2, 2, 2, 2, 2… ## $ foam 3, 3, 3, 3, 3, 3, 3, 3, 3, 3… ## $ solvent 0.250, 0.250, 0.250, 0.250, … a. (5 points) Based on your grid from Q3 part c, without changing anything else, what precise thickness and total layers produces the optimal predicted durability ? Hint: This is just dplyr data-wrangling. Use your data.frame. to find precise numbers. b. (10 points) Calculate the predicted durability with a 95% simulated confidence interval, using 10,000 random draws, for the following situations: 1. predicted durability under optimal conditions (from Q4 part a) 2. predicted durability under current conditions. Current conditions are: thickness = 1 , layers = 2 , foam = 3 , and solvent = 0.25 . Hint: Remember to back-transform. your outcome variable if necessary before getting quantiles() .) c. (5 points) Visualize your two confidence intervals for the original vs. optimal durability from Q4b in a single plot in ggplot ! (Be sure to use clear labeling, themes, and colors.) Hint: Any of the strategies from Workshop 14.4.2 d. (10 points) Suppose you want to cut costs for the other factors. Apply Response Surface Methodology using the steps below to investigate the intervening effect of foam padding on durability: Create a new grid, repeating the same conditions but allowing the level of foam padding to vary between 2 , 2.5 , and 3 mm. Visualize the new plot, split into panels for each level of foam ! Each bin should have a width of 500 . Hint: See Workshop 15.3.3 e. (5 points) For each level of foam, filter() to find the optimal durability and the associated level of thickness and layers . If you decrease the amount of foam padding, what do you need to do to the thickness and/or layers to achieve optimal durability ? Hint: This is just data-wrangling. Use your dplyr toolkit.

$25.00 View

[SOLVED] PHAS0007 Experimental Physics Training Python

PHAS0007 Experimental Physics Training Experiment Q2-A : Determination of the  charge to mass ratio of an electron Experiment Objectives: To become familiar with three experiments from Q1 to Q6 that make measurements of the Planck constant, the charge to mass ratio of the electron and the Rydberg constant, which plays an essential role in the theory of spectral lines. In carrying out the experiment, you will develop skills in taking and analysing data, recording experimental procedures, estimating uncertainty and drawing conclusions. Note, that there are two similar experimental kits (A and B) for each of the three experiments. 1.   Atomic spectra (Q1-A and Q1-B) 2.   Determination of the charge to mass ratio of an electron (Q2-A and Q2-B) 3.   Measurement of Planck’s constant (Q3-A and Q3-B) Relevant Lecture Courses: Atoms, Stars and the Universe PHAS0004 Waves, Optics and Acoustics PHAS0005 Introduction During your PHAS0007 laboratory course, you will carry out three experiments. Each demonstrates a separate and fundamental concept in our current understanding of physics. Depending on which experiment you conduct, values for the Planck constant  (h), the charge to mass ratio of the electron (e/m) or the Rydberg constant (R ) will be determined and compared with accepted values. As this is your first undergraduate physics laboratory, we will focus on developing your experimental scientific method. A full record of your experimental procedure, results and data analysis should be kept within your laboratory notebook. You will be given guidance on how to conduct and record your experimental work by the Demonstrators who will use a guide (which you will have a copy of) to help you to do this. You should   record your data in an appropriate table in your notebook. The data should also be plotted on a computer-generated graph using Python (or similar). Remember to include error bars and label the axes correctly. Finally, you should conduct a least squares fit of  the appropriate mathematical model to your data, stating the x2 value of your fit and discussing its significance. You will learn how to do this in your statistics course. Your record-keeping will be assessed through a Digital Retrieval Test which explores the information you have stored in your lab book. You will also submit a formal report of one experiment. Q2. Determination of the charge to mass ratio of the electron Q2.1 Introduction In the early 1900’s J.J.Thomson investigated how moving electrons behave in both electric and magnetic fields. From these studies he developed a balanced-field experiment where a beam of electrons is subjected to magnetic and electric fields in such a way that the forces on the electrons from each field cancel and they suffer no overall deflection. This enabled him to determine the charge to mass ratio, em, of the electron. An electron having a charge e  moving in a uniform. electric field of intensity E is subjected to a force FE  in the direction of the field where FE  = Ee .                                                                 (2.1) Similarly an electron moving in a uniform magnetic field of flux density B with a velocity v at right angles to B, experiences a force FB which is perpendicular to both B and  v  of  magnitude FB   = Bev .                                                                 (2.2) The effect of applying electric and magnetic fields to a moving electron (as part of a beam of electrons) is shown in figure 2.1. Figure 2.1 The forces on moving electrons due to applied electric and magnetic fields Provided the magnetic and electric field directions are at right angles the two forces can be anti-parallel. If the field strengths are adjusted the two forces can cancel and the electron will travel undeflected. For null deflection we require: Ee = Bev                                                            (2.3) In the experiment the electrons are accelerated to speed v as a result of a potential difference in an electron gun. The law of conservation of energy requires that the kinetic energy of the electron as it leaves the electron gun (determined by the voltage on the anode) must equal the change in its electrical potential energy. For an anode voltage of Va this gives the following, 2/mv2 = eVa                                                                                              (2.4) Where m is the mass of the electron. From equation 2.4: m/e = 2Va/v2                                                                (2.5) Substituting equation 2.3 into 2.5 gives: m/e = 2Va/v2 = 2B2Va/E2                                                             (2.6) Consequently with knowledge of the applied electric and magnetic field strengths and the accelerating voltage on the electron gun it is possible to determine the charge to mass ratio, m/e, of the electron. Q2.2 Overview of Apparatus The main components of the apparatus are illustrated in figure 2.1 and 2.2 and the connections to the power supplies are shown in figure 2.3. The electron gun and deflecting plates are contained within an evacuated glass envelope. This whole unit is known as a Thompson tube. Electric connections between the Thomson tube and the various power supplies are  achieved by connecting the supplies to the appropriately labelled inputs on the tube housing. The tube cathode is heated by a filament carrying a current driven by a 6V a.c. voltage. Electrons are emitted by the heated cathode and are accelerated through the potential difference Va towards the anode, passing through an aperture (slot) in it. A further aperture allows a planar beam of electrons to continue into the region between the deflector plates and the magnetic field coils. Here the electron beam experiences deflections, in the directions identified in figure 2.1, due to the electric field derived from the potential difference Vp across the plates and the magnetic field generated by the current IH through the pair of coils which are in series. An arrangement of parallel coils like this, called Helmholtz coils, has the advantage that, provided they are set a distance apart equal to their radius (in this case 6.9cm), the field can be considered to be uniform. between them. The deflector plates are also designed so that an acceptable level of field uniformity is achieved in the region of the beam path. Note that the electron beam in the region between the coils and plates illuminates the surface of a curved luminescent screen, making its trajectory visible. The curvature of this screen enables electrons across the width of the planar beam to ‘impact’ it at different points along the whole beam’s length. As a result, the influence of the two fields can be seen across the length of the planar electron-beam path. Figure 2.1 Schematic diagram of the e/m Thompson tube The uniform magnetic field between the coils in this case is given by:- B = kIH                                                                                                             (2.7) Where k = 4.17 × 10-3 TA-1 . The current, IH,   (in mA) is given on the right hand display of the Thurlby power supply (ignore the flashing decimal points). The plate supply voltage is supplied by the ‘in-house’ power supply and is monitored via a 1/100 potential divider via a DVM. The magnitude of the electric field E between the plates depends on the separation of the plates, d, and the potential  Vp  applied to the plates as: E = d/Vp (2.8) The separation of the plates, d, in the Thompson tube is 8.0mm. Figure 2.2 Experimental apparatus used to conduct a determination of the charge to mass ratio of the elctron Figure 2.3 Connection of the Thompson tube to power supplies and meters

$25.00 View

[SOLVED] BUSI4528 QUANTITATIVE RESEARCH METHODS FOR FINANCE AND INVESTMENT AUTUMN SEMESTER 2020-2021

BUSI4528-E1 A LEVEL 4 MODULE, AUTUMN SEMESTER 2020-2021 QUANTITATIVE RESEARCH METHODS FOR FINANCE AND INVESTMENT 1.  a) Answer the following points regarding the Linear Probability Model: (i)      Using standard mathematical notation, outline the Linear Probability Model for binary dependent variables.   [20 marks] (ii)     Explain how the LPM can be used to predict binary choices.   [5 marks] (iii)    Explain how to interpret the regression coefficients in the LPM.   [10 marks] (iv)    Explain what are the main shortcomings of the LPM?    [15 marks] b) Consider a model explaining the monthly sales of a popular brand of coffee as a function of its price and the average price of two competitors. Also included is an indicator variable disp=1 if there is a store display but no newspaper ad during the month for the target brand, and 0 otherwise. The indicator variable dispad=1 if there is a store display during the month for the target brand and newspaper ads, 0 otherwise. The estimated results were obtained using Stata: Sales: logarithm of 1000’s boxes sold Price: average price of the target brand in $ for a given month Price1 and Price2: average prices of two competitors. (i)   Write down the regression model and interpret the meaning and significance of each coefficient, including the   intercept. Are the signs and the relative magnitudes for the advertising variables consistent with economic logic?     [25 marks] (ii)  Label the parameters in the equation β1, β2  … β6 . With β5   and β6   corresponding to the coefficients of disp and dispad, respectively. If the null hypothesis is H0 : β6  ≤ β5 , state the alternative hypothesis. Why is the test of this null hypothesis against the alternative hypothesis interesting? Carry out the test at the 1% significance, given the calculated t-value is 6.34. What do you conclude?   [15 marks] (iii) What is an indicator variable and how to avoid the indicator variable trap? In the above regression, assume there is another indicator variable : ads=1, if there is newspaper ads for target brand, and ads=0 otherwise. Explain how to obtain an interaction variable of the indicator variables disp and ads.   [10 marks] Total [100 marks] 2. a) Compare and contrast the Durbin-Watson (DW) d-statistic test with the Lagrange Multiplier (LM) approach to test for serial correlation in time series models.    [50 marks] b) We try to explore the relationship between the cost per student and related factors at four-year  colleges  in  the  U.S.,  covering  the  period  1987  to  2011. We  run a  OLS regression using Stata, and the results are as below: where  lntc  is the  logarithm total cost  per student, ftestu  is the  number of full time equivalent students, ftgrad is the number of full-time graduate students, tt is the number of tenure track faculty per 100 students, GA is the number of graduate assistants per 100 students, and CF is the number of contract faculty per 100 students, which are hired on a year to year basis. (i)      Write down the regression model.   [10 marks] (ii)     If we consider university ‘identity’ as a factor that could affect average cost per student, how should we adjust the estimation? Write down the new model and explain the differences with the model in point (i).    [25 marks] (iii)    What is the F-test used for? Comment the F-test result for the above model.  [15 marks] Total [100 marks] 3   a)  Outline the setup of the Logit model for binary dependent variables.    [35 marks] b) Explain why the conventional R-square index is not a valid measure to evaluate the goodness-of-fit of the Logit model. Discuss which other measures of model fit should be used instead.    [15 marks] c)  We estimate a regression describing the relationship between the cost per student and related factors at four-year colleges in the U.S., covering the period 1987 to 2011, where lntc is the logarithm total cost per student, ftestu is the number of full time equivalent students, ftgrad is the number of full-time graduate students, tt is the number of tenure track faculty per 100 students, GA is the number of graduate assistants per 100 students, and CF is the number of contract faculty per 100 students, which are hired on a year to year basis. One of the test results in our analysis is as follows : (i)      Explain what this test is used for, and explain what you can learn from the result of the test. [10 marks] (ii)     Explain the differences between fixed effects model and random effects model. [25 marks] (iii)    Describe what heteroscedasticity is, and discuss what the consequences of the presence of heteroscedasticity are for linear regression analysis.   [15 marks] Total [100 marks] 4.  a) Explain how you could test the null hypothesis of no cointegration between two time series variables.   [40 marks] b) Explain what is meant by ‘spurious regression’ . In what sense should empirical analysis be cautious of it?   [10 marks] c)  Discuss how the difference-in-difference (DID) estimator (specify the  DID regression model) might be used to test for a potential treatment effect of a policy reform. Outline the key assumptions for the DID estimation. Use graphs where necessary.   [50 marks] Total [100 marks]

$25.00 View

[SOLVED] CCM 3123 Concepts in Spatial Design C/C

CCM 3123  Concepts in Spatial Design   Individual Project (Total 20% - 10% for oral presentation and 10% for written report  and technical drawings) Topic:  Choose any ONE out of 3 (A) Your Young Spirits in UIC University Town or (B) Your Dream House or (C) Your New Look of Cultural Heritage (A) Background Information (UIC) UIC is a Liberal Arts College which has been established since 2005. The overall campus is full of energy and young spirits which strive for academic excellence. Through the project, the students can design an item/a space/a sculpture which suits the UIC campus environment. Instructions: 1.    Students are asked to research and take photos about their research areas in UIC,  including outdoor OR indoor in UIC. 2.    Choose 2 photos/images as the favourite places of locations A and B . Then analyze the characteristics of each space.  Please ask the lecturer in case of queries. (B) Your Dream House 1.   “Everyone has a Dream” and you can “Brand your Dream and make it come true”.  Students are asked to choose a place they are interested in.  Develop a theme and build up a concept upon the structure  e.g. Green apartment, floral garden and greenhouse, a craft or doll house, love and wedding place, warm villa for family or the shared building for the needy. 2.    Choose 2 photos/images as the favourite places of locations A and B. Then analyze the characteristics of  each  space.  Please ask the lecturer in case of queries. (c)  Huitong Village or Tangjia Village has been enriched with cultural heritage historically with some scenic spots such as temples, monumental halls, museums,  modern hostel and restaurants. Instructions: 1.    Students are asked to research and take photos about their research areas in the villages. 2.    Choose 2 photos/images as the favourite places of locations A and B . Then analyze the characteristics of each space.  Please ask the lecturer in case of queries. Hints: (a) Factors - e.g. Sunshine, orientation, good view of greenery or to the view to the sea, space is comfortable to walk, what kinds of amenities or facilities) Pay attention to the  natural environment,… physical needs,…….. psychological needs…. (b) Add your creative concepts and originality is important with your own design and concepts although you can add some existing images as secondary sources. Remember to put down the sources of images under references. 3.    Explain and Design your ideas in 2 parts: Part  A  Oral presentation with PPT (aided with other appropriate materials) 1. Use PPT to present your 1 designed item in English. (see details in example provided) 2. Every student will be provided with 3 min., 1 min. Q & A .(4 min. totally) to be presented in English. (10%): Part B  Written report with illustrated images (see details in example provided) 1. The overall word count is 800 words  (+ or - 10%). 2. Analysis: 2. 1 Explain the characteristics of 2 locations A,B,C with photos. 2.2  Among the 2, the 1st location is your most favourite one. 2.3  Explain the characteristics of Location A with details (including background information, concept, theme, style. and details of objects and their locations displayed using design terms - design elements and principle of organization), operational budget and material boards. 2.4 1 set of drawings Ø coloured floor plan (at least 1/2 of A4-sized) using AutoCAD or Google SketchUp mainly* Ø coloured front elevation (at least 1/2 of A4-sized) using AutoCAD or Google SketchUp mainly* Ø coloured 3-D view (1 full A4 sized paper) using AutoCAD or Google SketchUp mainly* 2.5  References (10%) (*Note: use Google SketchUp mainly for all basic drawings, aided with some hand drawings or other computer softwares only if needed) 1. Oral Presentation 2. Final individual assignment and written report and other whatever appropriate. 3. Final Exam. Review To be discussed. By 16 Dec 2024 (Mon.) OR 16 Dec 2024 (Mon.)

$25.00 View

[SOLVED] LM Data Mining and Machine Learning 2024 Lab 1 Text Retrieval Matlab

LM Data Mining and Machine Learning (2024) Lab 1 – Text Retrieval PART 1: TF-IDF BASED TEXT RETRIEVAL Objective The objective of this lab session is to apply the text-based Information Retrieval (IR) techniques which we have studied in lectures, namely: 1. Stop word removal 2. Stemming 3. Construction of the index – calculation of TF-IDF weights 4. Retrieval – calculating the similarity between a query and document We will apply these techniques to a ‘toy’ corpus consisting of 112 documents – BEng final year project specifications.  These project specifications were submitted by staff in Word format, but I have converted them all into plain text files for the purposes of this lab.   However,  I  did  not remove the formatting or the pieces of text which are common to all of the files. Copy the zip archive lab1-2024from Canvas and ‘unzip’ it.  You should end up with a new folder called lab1-2024 containing all of the files that you need to complete the lab, including a folder called docOrig which contains 112 text files. The folder lab1-2024 will be the default folder that you work from.  Have a look at one of the text files in the docOrig folder. You should be able to identify the common formatting. Processing of the documents Before we can do IR we need to apply stop word removal and stemming to each of the documents in our corpus.  To do this you will use two executable (.exe) files of the C programmes that are in your  lab1-2024 folder:  stop.exe and porter-stemmer.exe.   Note  that  there are also source C programmes provided in a case your computer runs on a non-Windows operating system –  in  that   case,  you  will   need  to   compile  the   source  C   programmes  (stop.c,  porter- stemmer.c, index.c and retrieve.c). Task 1: Stop word removal: The next task is to remove stop words from each of the documents. The 50-word stop word list stopList50 should already be in your lab1-2024folder. Now run the program stop on one of the documents – AbassiM.txt for example. To run the program, just type the below in the Command Prompt window: stop stoplist50 docOrigAbassiM.txt (note that the above includes the path name to tell stop where AbassiM.txt is – this is the docOrig folder).  This should cause a version of AbassiM.txt with stop words removed to be printed onto your screen.  You need to store this output in a text file AbassiM.stp.     To   keep   the   ‘stopped’   documents   separate   from   the   original documents, there is created folder in lab1-2024 called docStop.  All of the ‘stopped’ documents should go in this new folder. You need to apply stop to all of the project description files.  To do this I have created a  batch  file  called  stopScript.bat,  which  you  should  have  in  your  lab1-2024 folder.  In the Command Prompt window just type stopScript followed by ‘return’ . You need to be in the lab1-2024 folder when you do this. You should now have 112 files in the docStop folder, each with a name of the form filename.stp. Question 1: What is the percentage reduction in the number of words in a document as a consequence of stop-word removal – specifically, what is the reduction in the case of the file AgricoleW.txt? Task 2:    Stemming: The next task is to apply the porter stemmer to each ‘ .stp’ file.  There is created  another  folder  in  lab1-2024 called  docStem.    This  folder  will  contain  a stemmed version of each file from the docStop folder. Basically, for each  .stp file you create a  .stm file by typing, for example, porter-stemmer docStopAbassiM.stp This causes a ‘stemmed’ version of AbassiM.stp to be printed on screen.  You need this data to be stored in a file called docStem/AbassiM.stm.  You need to do this for every  .stp file.  To do this I have created another batch file called stemScript.bat, which you should have in your lab1-2024 folder.  In Command Prompt window just type stemScript followed by ‘return’ . You need to be in the lab1-2024folder when you do this. Question 2: Find the file AgricoleW.stm. What are the results of applying the porter- stemmer to the words communications, sophisticated and transmissions? You should now have: -    112 original .txt documents in the folder docOrig -    112 ‘stopped’ documents in the folder docStop -    112 ‘stemmed’ documents in the folder docStem Task 3:    Create the document index files: If you’ve forgotten what the document index is, or what it is for, look again at the lecture slides.  The next task is to create 3 index files: one for the original  .txt documents, one for the  .stp documents, and one for the  .stm documents. You should have the executable index.exe in your lab1-2024folder (or compile the program index.c if needed). You should have a text file called textFileList in your lab1-2024 folder.  This is simply a list of all of the original .txt files – one file per line.  Type: index textFileList followed by ‘return’ .  After a short pause a text version of the index file will be printed on your screen.  You need to store this data in a file called textIndex.  Type: index textFileList > textIndex followed by ‘return’ .  Look at this index file (open it in a text editor such as Notepad) and try to understand the information it contains.  The lecture notes will help you.  The first part of the file gives the list of documents with their document length (this is not the length in bytes – see lecture notes if you are unclear).  The second part of the file gives the list of all words (ordered based on IDF) that occurred in the set of documents and information related to each word.  For each word (its position is indicated in front of the word name), there is the total number of times the word appeared (wordCount), number of documents it appeared in (docCount), and the IDF value of the word.  This is then followed with the list of documents the word appeared in, the count and calculated weight. Now repeat this on the ‘stopped’ and ‘stemmed’ files: index stopFileList > stopIndex index stemFileList > stemIndex Question 3:  What are the ‘document lengths’ of documents: docOrigDongP.txt, docStopDongP.stp and docStemDongP.stm?  Why are they different?  Why is the   difference    between   the   document    lengths   of    docStemDongP.stm and docOrigDongP.txt greater than the difference between the document lengths of docStopDongP.stp and docOrigDongP.txt? Question 4: The IDF of the term design is approx. 0.009.  Why is it so close to zero? Question 5 : Find the word algorithm in the three index files.  Explain why the entries for this word are different in the three files. Task 4:    Retrieval:  The final task in this part of the lab is retrieval.  To do this you will need to create a query.  This is just a text file containing your query – you can create it using Notepad or Wordpad. An example query – in file query – is in your lab1-2024folder. This query just contains the text: circuits and devices Next you need to apply stop word removal and stemming to the query: stop stoplist50 query > query.stp porter-stemmer query.stp > query.stm You should have the executable retrieve.exe of the C program in your lab1-2024 folder (or compile the source C program if needed).  You can now do retrieval. Start with the raw text files: retrieve textIndex query followed by ‘return’ .  This will return a list of all the documents for which the similarity with the  query  is  greater  than  0.    It  also  tells  you  the  identity  of  the  most  similar document. Now  repeat  this  for  the  stopped  documents  and  stopped  query,  and  stemmed documents and stemmed query: retrieve stopIndex query.stp retrieve stemIndex query.stm Question 6: Compare the results of the above two searches (using .stp and .stm) with the result for the original raw text files.  What do you conclude? Question 7: Repeat Task 4 with one query of your own and report the results. PART 2: LATENT SEMANTIC ANALYSIS Objective The objective of the second part of the lab is to apply Latent Semantic Analysis (LSA) to the set of BEng final year project specifications in the docOrig folder. Look at the notes on LSA to remind yourself about the technique, to put the following sequence of tasks into context. Task 1:  Create the Word-Document matrix Recall that the Word-Document matrix Wis an N x V matrix, where N is the number of documents and V is the vocabulary size (the number of different words in the corpus).  The nth  row of W is the document vector vec(dn) for the nth  document. The executable doc2vec.exe of the C program will create the matrix W (or compile the source C program if needed).  We will apply this program to the stemmed documents.  The command is: doc2vec stemFileList.txt > WDM This creates a document vector for each document in the docStem folder and stacks them to create the matrix in the file  WDM. Task 2:  Apply Singular Value Decomposition (SVD) to the Word-Document matrix This is done in MATLAB.  You will need the following commands (the quote symbols used below should in Matlab be single quotes): >>W=load(‘WDM’); This reads the data in  WDM into the MATLAB matrix  W >>[U,S,V]=svd(W); This runs SVD on  W, decomposing it as  W = USVT. Question 1: Are the matrices U and V as you would expect?  Explain. Verify that the singular values, the diagonal elements of S, are ordered according to size. Question 2: What are the values of the first 3 diagonal entries in S? Now recall that the singular vectors, the ‘latent semantic classes’, correspond to the columns of V. You can access, for example, the first column of V and write it into the vector sv1 by using the MATLAB command: >>sv1=V(:,1); Do this for the first 3 columns of V, creating singular vectors sv1, sv2 and sv3. Now you are going to try to interpret these vectors.  Intuitively, the most important words that determine the interpretation of the vector sv1 are those for which the corresponding coordinate of sv1 is biggest (positive or negative). To find the biggest positive value in sv1 we can just use: >>m=max(sv1); But we don’t just want to know the size of the biggest number, we also need to know its position in the vector so that we know which word it corresponds to.  So use: >>[m,am]=max(sv1); In this case m is the maximum value in sv1 and am is its index (argmax).   Find the words that correspond to the three biggest values in sv1.  To achieve this you need to know the order that the words occur in when the document vectors were constructed.  The program doc2vec.exe is based on index.exe, and the word order is the same in both programs.  So the nth  component of a document vector corresponds to the nth  word in the corresponding index file.  Hint, the most significant word for sv1 turns out to be ‘project’ . Question 3:  Find the three most significant words for each of the singular vectors sv1, sv2 and sv3.  What is your interpretation of the corresponding semantic classes?

$25.00 View

[SOLVED] Game Theory - Review Questions Test 3 Java

Game Theory - Review Questions Test 3 Topics ●  Sequential Move Game ● Imperfect Information ● Subgame Perfect Equilibrium Exercise 1 Consider a game in which player 1 first selects between I and O. If player 1 selects O, then the game ends with the payoff vector (x,1) (x for player 1), where x is some positive number.  If player  1 selects I, then this selection is revealed to player 2 and then the players play the battle-of-the-sexes game in which they simultaneously and independently choose between A and B. If they coordinate on A, then the payoff vector is (3, 1). If they coordinate on B, then the payoff vector is (1 , 3).  If they fail to coordinate, then the payoff vector is (0, 0). (a)  Represent this game in the extensive and normal forms. (b)  Find the pure-strategy Nash equilibria of this game. (c)  Calculate the mixed-strategy Nash equilibria and note how they depend on x. (d)  Represent the proper subgame in the normal form and find its equilibria. (e) What are the pure-strategy subgame perfect equilibria of the game?  Can you find any Nash equilibria that are not subgame perfect? (f) What are the mixed-strategy subgame perfect equilibria of the game? Exercise 2 Consider the following game.  An incumbent monopolist (firm 1) can either be passive or take a specific action which costs K dollars.  A potential entrant (firm 2) observes this and decides whether to enter or not. If she stays out her profits are zero, while the incumbent’s profits are the monopoly profits πM   (minus the cost of the action, if such action was taken).  If the incumbent took the action and the potential entrant stays out, the incumbent has a choice between undoing the action (thereby recovering K dollars) or making no change.  A duopoly game follows the three situations:  (no-action, in),  (action, in, undo),  (action, in, no-change).  Let π be firm i’s profit  (i=1,2) at the unique Nash equilibrium of the game where no action was taken or, if it was taken, it was subsequently undone.  Let πi(0)  be firm i’s profit at the unique Nash equilibrium of the remaining duopoly game. Assume that π2(*)  > 0 and π > πi(0)  for i=1,2. (a)  Draw the extensive game described above. (b)  Find the subgame-perfect equilibria of this game.  Is there a subgame-perfect equilibrium characterized by entry deterrence? Exercise 3 Consider the following game. Figure 1: Excercise 4 (a)  Solve the game by backward induction and report the strategy profile that results. (b)  How many proper subgames does this game have? (c)  SPE? Exercise 4 Two individuals, A and B, are working on a join project.  They can devote it either high effort or low effort. If both players devote high effort, the outcome of the project is of high quality and each one receives 100$. If one or both devote low effort, the outcome of the project is of low quality and each one receives 50$.  The opportunity cost to provide high effort is 30.  The opportunity cost to provide low effort is 0. Individual A moves first, individual B observes the action of A and then moves. 1.  Represent this situation using the extensive form representation 2.  For both players write all possible strategies 3.  Using the normal form, find all Nash equilibrium. 4.  Find all Subgame Perfect Nash Equilibria Exercise 5 Consider the following game in extensive form.  On the nodes where  1 (respectively 2) is written, player 1 (respectively 2) moves. For each outcome of the game, the first number represents the utility of player 1 and the second number the utility of player 2. 1.  Apply backward induction 2. Write the game in normal form. 3.  Find all pure Nash equilibria. Which ones are sub-game perfect? Exercise 6 A finitely repeated game. Consider the two-player game 1.  Find all the pure-strategy Nash equilibria of this game. 2.  Suppose that this game is played twice (i.e., played and then repeated once).  Construct a pure-strategy SPE in which (D, d) is played in the first stage. Exercise 7 Assume that this game is repeated an infinite number of times, and that both the row and column player discount the future with the same discount factor δ . 1.  Suppose that both players follow the following grim-trigger strategy:  ”play c as long as no-one has ever played d; otherwise play d” .  Find the minimum value  /delta of such that this is a subgame-perfect equilibria. 2.  Suppose that row and column are playing some SPE equilibrium of the in nitely repeated game.  They may or may not have played according to the equilibrium strategies so far. Let Vr  and Vc  denote the present discounted values of continuing to play from here on according to the equilibrium strategies. What are the lowest values that Vr  and Vc  could have?  [Hint: there are no calculations involved here Exercise 8:Infinitely repeated games Boston Heat and Boston Warmth are the only two firms allowed to provide home-heating oil in Boston. Each firm has a constant marginal cost of supplying oil equal to 1 per gallon.  Let the prices per gallon of the two firm be ph  and pw  respectively.  Heating oil is a perfectly homogeneous good, so all customers buy from whichever company offer the lower price. The total demand for oil in Boston is given by the following demand function: Q(pL ) = 200, 000 − 100, 000pL where pL  is whichever is the lower of ph  and pw .  For example, if ph   = $0 and pw   = $1.50, then total sales in Boston would be 200,000 gallons, all customers would buy from Heat, and Heat would make losses of $200,000.  If ph = $1:75 and pw = $1:25, then total sales would be 200,000 - 100,000 (1.25) = 75.000 gallons, all customers would buy from Warmth, and Warmth would make pro ts of $18,750.  Assume that, if Heat and Warmth announce the same price, demand splits exactly equally between the two firms. 1.  For the moment, suppose that this competition between Heat and Warmth occurs just once. Suppose that the firm announce their prices simultaneous. a What prices are strictly dominated strategies, and what prices are weakly dominated strategie b  Find all the Nash equilibria in this game.  For all the Nash equilibria in this game, what is the equilibrium price pL . 2.  Now suppose that this competition is played repeatedly, year after year,  and that both firm have discount factor δ . a  Find the lowest δ such that the firm are able to sustain the monopoly price in a subgame perfect equilibrium.  Construct such an equilibrium and explain briefly why no other subgame perfect equilibrium can sustain the monopoly price at a lower δ . b  Suppose instead that the demand for heating is given by Q(pL ) = a−b*pL  gallons, where a > 1, and b > 0.  In this case, what is the lowest δ such that firms can sustain the monopoly price in a subgame perfect equilibrium. Explain what is general about this result and why?

$25.00 View

[SOLVED] MATH 127 Sample Final Exam AR

MATH 127: Sample Final Exam A 1.  The following symbolic statement describes an important mathematical theorem: ∀X ∈ P(N), (X ≠ ∅ → ∃x ∈ X,∀y ∈ X, x ≤ y). (a) What is the name of the theorem represented by this statement?                                        [2 pts] (b) Write the logical negation of the given statement in maximally negated form.                   [5 pts] 2.  Define a function f : R → Z by f(x) = ⌊x⌋ and a function g : Z → N by (a) Write Img◦f  ({1, √2, 5, −3.5}) in roster notation.                                                                  [4 pts] (b)  Determine whether PreImf({0, 1}) is countable or uncountable.   Justify your answer with a sentence or two.                    [4 pts] 3.  Let  a1 , a2 , . . . , a9   ∈ Z+ , none of which has a prime factor greater than 5.  Prove that there must exist i,j ∈ [9] such that i ≠ j and the product aiaj  is a perfect square.             [8 pts] 4.  For the following problems, provide  an appropriate counting argument to justify your answer. You may leave  answers  as products, sums  of integers, factorials,  or binomial coefficients. You have a large bag of M&M’s consisting of the standard 6 colors:  Blue,  Brown,  Green, Orange,  Red,  and Yellow.   The  M&M’s  are  indistinguishable  except  by  color.   You  may assume an unlimited amount of each color. (a) In how many different ways can you grab 15 M&M’s from the bag?                                    [3 pts] (b) Use the Principle of Inclusion/Exclusion to determine the number of ways in which you can grab 15 M&M’s such that you don’t have more than 4 of any given color.                   [6 pts] (c) After reaching into the bag, you end up with 4 Blue, 4 Orange, 3 Brown, 2 Yellow, 1 Red, and 1 Green. In how many different ways can you put these 15 M&M’s in a line?  (Remember that the M&M’s  are  only  distinguishable  by  color.)          [6 pts] 5. Let F = {f : N → N | f is a function}. Define a relation ~ on F by f ~ g ⇐⇒ |{i ∈ N | f(i) g(i)}|  < N0 . (a)  Prove that ~ is an equivalence relation on F.                                                                       [12 pts] (b)  Given two functions f, g ∈ F, we define their sum, f + g, as the function f + g : N → N such that (f + g)(n) = f(n) + g(n).  Assume that f1 , f2 , g1 , g2  ∈ F such that f1  ~ f2 and g1  ~ g2 . Prove that f1 + g1  ~ f2 + g2 .       [6 pts] 6. Let n ∈ N. Prove that using a “counting in two ways” argument.                                                                                     [12 pts] ● Use the exact form of the equation as given; do not simplify it algebraically. ● If your argument involves constructing partitions, justify that your construction defines a valid partition. 7. Let n be an arbitrary and fixed odd integer.  Prove that the following congruence holds for all integers k ≥ 3:                                                     [10 pts] n(2k−2) ≡ 1    (mod 2k ) 8. Let S = {n ∈ Z | n ≡ 1  (mod 7)}.  Prove that |S| = |N| by explicitly constructing a bijection between S and N, and proving that it is a bijection. Do not use the Cantor-Bernstein-Schr¨oder (CBS) Theorem. You do not need to prove that your function is  well-defined,  but ill-defined functions  will lose the majority of points.       [12 pts] 9. Let a and b be coprime integers. Prove that there exist x,y ∈ Z such that (a3 b2 )x + (a + b)y = 1. [10 pts]

$25.00 View

[SOLVED] Coursework for COMP4132 24-25R

Coursework for COMP4132 24-25 Overview The coursework aims to make use of the machine learning techniques learned from this module to solve a practical problem. The coursework consists of three key parts: 1) main submission: coursework report and code, 2) Individual report, 3) presentation. Please read this document carefully to see the requirement of the coursework and the explanations and further details. Important dates •    Team formation: 26th Nov 2024 •    Final submission deadline: 23th Dec 2024 •    Presentations: 24th Dec 2024 Copying Code and Plagiarism You may freely copy and adapt any of the code samples provided in the lab exercises or lectures. You may freely copy code samples from the PyTorch documentation, which have many examples explaining how to do specific tasks. This coursework assumes that you will do so and doing so is apart of the coursework. You are therefore not passing someone else’s code off as your own, thus doing so does not count as plagiarism. You can, and you should look at other code/papers online, but you need to reference any source/material that you have used as inspiration, and highlight what’s your contribution. Turnitin/JPlag will detect any use of external sources automatically. Successful completion means that you are able to explain your solution during the presentations. The university takes plagiarism extremely seriously and this can result in getting 0 for the group coursework, the entire module, or potentially much worse. Getting Help You MAY ask the module convenor for help in understanding the group coursework requirements if they  are not clear (i.e. what you need to achieve). Talk to me during the labs, after the lecture, or post your questions on Moodle. Any necessary clarifications will then be added to the Moodle page or posted on the discussion forum so that everyone can see them. You may NOT get help from anybody else (other  than your group mates) to actually do the coursework (i.e. how to do it), including the module convenor. Task Specification The aim of this coursework is to offer you an opportunity to put your hands on designing/developing an advanced machine learning based solution. Language Models (LM) are very popular nowadays in the area of Natural Language Processing. In this coursework you are asked to use LM together with other machine learning techniques learned in this module to solve the problem of joke generation. In particular, you are expected to provide a machine learning based solution that is able to generate a joke given an input such as a few starting words, similar to one of your lab exercises but much more compherensive. Your solution should beat least satisfying the following requirement: •    Be able to generate a full joke which may consists of several sentences. •    The  generated joke should at least make some sense compared with say random generation, based on the data you used. •    Additional functionalities that make the solution better. Note that you may not have enough resources (i.e. GPUs) to perform a thorough training. You can useless data for the training in your laptop or you can use online resources such as Google Colab. You can use the lab as a starting point but your solution should not be the same as the labsolution. You should usethis datasetas the training data. Team Formation Instructions You should form groups of at most three students. The group contributions on the coursework will be assessed, but also each individual effort. Each group should select one person as the Team leader. The team  leader  will  be  in  charge  of  organising   meetings,  team  coordination,  group  submissions  and communications.  During the  lab  session  on  26th   Nov,  you  will  form.  your group. You can talk to the convenor for better understanding of the coursework. Coursework Report (main submission) The report must be clearly presented in English with no more than 4000 words, excluding all the figures and tables, summarizing how the task is done, justification on your decisions involved, and the results of your analysis. This report should be submitted (with code) via Moodle by the due date. The folder should be named by the group id (to be assigned). The submission should cover the following: 1.   Team ID (assigned by the module convenor), student names and Ids. 2.    Introduction presents the aims, the problem you solved, and outlines the solution. 3.    Methodology focuses on the reasoning for the certain techniques and designs that you used in this coursework. This describes and explains your chosen methods and design. It is very important to elaborate why you design your solution in your way. 4.    Result and discussion contains a description and analysis of the results. 5.    Conclusion allows you to have the final say on the issues you have  raised in this coursework, synthesize your thoughts, demonstrate the rationale of your solution. 6.    Reference if there are links, papers, codes are used. Code All the code, and documentation files should be submitted with the main submission via Moodle. You do not need to upload the model if it is too big. Instead you can put in on online storage such as one drive and baidu disk and put the link together with the documentation. The code should contain a README file explaining all the included materials, and references to other codes/papers you have used for inspiration. Moreover, some brief documentation(s) should be provided for each code file to explain the code structure and describe how to use the code and data. Individual Report Each member of a team is expected to submit a two-page report including the following: 1.   The student information including full name, email, and ID. 2.    A  table  of  participation  marks:  this   should  provide  marks  to   show   how  group  members contributed/collaborated to the coursework. This table should have three columns, 1) student full name, 2) a mark out of 10 and 3) one (or maximum two) sentence(s) for making justification. Student name Mark (out of 10) Marking justification       3.   A brief explanation of the individual role in the coursework outlining the offered contributions (Maximum one page). 4.   A discussion of individual understandings, findings, and reflections on the coursework and team- working (Maximum one page). This report should be submitted by each student via Moodle, a separate submission. The format should be: “Student Number. Pdf” (e.g., 20029784.pdf) This will be used to assess the role of each individual in the coursework. This report also might be used to ask relevant questions during the presentation. Presentation All the teams will present orally 24th Dec. For this, all group members will be required to be present to deliver a presentation and answer questions from attendees and module convenor. The presentations are open for all the students. All oral sessions will follow a similar structure to how they are held in atypical physical conference format. •    The module convenor will introduce each group. •    The authors will deliver the  presentation (8 minutes maximum) for the audience. Please make sure you practice this before the session. •    Once the  presentation  has concluded, the module convenor will facilitate a live Q&A period (3 minutes approx.) with the audience and the module convenor. •    Process repeats for each subsequent group in the session. Presentation tips •    Please use a template to make your presentation slides, it will be available on Moodle. •    Suggest having maximum 8 slides, 1 slide a minute. •    You need to upload your presentation file on Moodle, (one day) before the presentation. •    Start off with a brief introduction of yourselves and the key focus of your coursework. o This should outline the contributions of each student in the coursework. •    The outline of the presentation should be similar to the structure of the coursework  (e.g. introduction and motivation, methodology, experimental set-up, results, and conclusions). •    It is a 8-min presentation, do not aim to show every single aspect of your coursework, focus on the most important things/findings. Key points: Make sure you state clearly your motivation and how good your solution is. •    Be ready for any question and discuss your contribution. You can prepare additional slides/files for any potential questions you expect. You maybe asked to explain your solution, and even show your code, so please make sure that one appointed member of the group can share the screen and show the coursework and the code. •    All members should contribute during the presentation. Coursework Marking Criteria •    Individual Report (10 marks): Each member of the team is expected to submit an individual report. o Active role      Does the student actively participate in the coursework?      What are the student’s contributions to the coursework? How they are relevant and valuable? o Understanding      Does  the  report  show  a  fair  student’s   reflection  and   understanding  of  the coursework?        Does the report clearly highlight the key findings of the student in the coursework? •    Main Submission (70 marks): Each group should submit report of their machine learning solution  together with the code produced. o Introduction      Do the team understand the overall aims and the problem to be solved in this coursework?      Do they provide a clear description of the solution provided? o Design/Methodology      Explanation of the methodology.      Justification of the proposed methodology.. o Experiments and results      Are the experiments well designed to test the proposed solutions?      Do the results support the original idea?      Is the analysis coherent? o Writing      Clear description, reproducibility      Quality of visual elements, illustrations, tables.      Quality of References o Code and data - software quality      Efficiency, clarity of the code to solve the problem      Documentation •    Group Presentation (20 marks): Each group is asked to deliver a 8-min presentation summarising  their contribution + 3 mins for Q&A. o Quality and clarity of the presentation o Response to questions from the module convenor and public o Understanding of their solution o Individual  participation  in  the  presentation.  All  members  are  expected  to  participate equally.  

$25.00 View

[SOLVED] End-of-Semester Data Analysis Assignment

End-of-Semester Data Analysis Assignment Data and Problem Overview An avid fan of the PGA TOUR, who has limited statistical background knowledge, is asking for your help in answering one of the age-old questions in golf, namely what is the relative importance of each aspect of the game on the average prize money in professional golf. The data needed on the top 196 Tour players in 2006 can be found in the file pgatour2006.csv.   The meaning of the variables is as follows PrizeMoney (y)  Average prize money per tournament. DrivingAccuracy (x1 )  Driving accuracy is the percentage of time a player is able to hit the fairway with his tee shot. GIR (x2 )  Greens in regulation (GIR) is the percent of time a player was able to hit the green in regulation (greens hit in regulation/holes played). PuttingAverage (x3 )  Putting average measures putting performance on those holes where the green is hit in regulation (GIR). By using greens hit in regulation the effects of chipping close and one putting are eliminated. BirdieConversion (x4 )  Birdie conversion is the percent of time a player makes a birdie or better after hitting green in regulation. SandSaves (x5 )  Sand saves is the percent of time a player was able to get up and down once in a greenside sand bunker. Scrambling (x6 )  Scrambling is the percent of time that a player misses the green in regulation, but still makes par or better. PuttsPerRound (x7 )  Putts per round is the average total number of putts per round. There are other variables in the data set, but these will not be considered in more detail in the questions’ section. Instructions & Assessment Use  R  Markdown to  prepare your  answers to the  questions posed  in the parts below.   Unlike  a  usual homework assignment, where an answer to a question might include some R output and numerical values from calculations, most questions below require written responses in sentence/paragraph form.  For these questions, you will not receive full credit for simply providing R output or the result of calculations:  you need to clearly describe what you have done and provide appropriate discussion and interpretation. The assignment will be graded out of 100 total points.  Ninety of the 100 points are allocated across the parts below; the remaining 10 points will be awarded based on the quality of your write-up. Your write-up should be easy to read and appropriately formatted; plots and graphs should be appropriately sized, with easy to read labels and symbols; numeric results should be presented in a way that is easy to read.  Please use maximum 7 pages (in .pdf format) including figures and tables. As indicated in the course syllabus, this assignment is worth 15% of your final grade for the semester. Questions The following questions build on each other and will ultimately guide you over the steps to get a regression model to realistically represent the average prize money in professional golf. 1.  A statistician from Australia suggests to the analyst that they should not transform any of the co- variates, but that they should apply the log transformation to y.  Do you agree with this suggestion? Justify your answer.        18 points For the rest of the assignment, use the log transformations you decided for.  To facilitate grading, make sure to use natural—base e—logarithms, not any other base, when transforming the variables. 2.  Develop a regression model that contains all seven of the potential covariates listed above.  If relevant, explore methods of variable transformation (polynomials, logarithm, etc.)  and comment on the results. Explain your reasoning.                   18 points 3.  The golf fan wants to remove all covariates with “insignificant” t values (β(^)k /σ(^)β(ˆ)k ) from the entire model in a single step.  Explain why you do not recommend this approach.  What alternatives would you recommend? A verbal answer suffices here.                                                                                    18 points 4.  Based on your reply to 3, create a final regression model to realistically represent the average prize money in professional golf and justify the choice of this specific model.                    18 points 5.  Diagnose your model by considering how the model assumptions of linear regression models are being met.                                18 points  

$25.00 View

[SOLVED] Chemistry 110B final exam 2022F C/C

CHE 110B Final Exam – Due 9 December 2022 2022F ‖ Chemistry 110B final exam This exam covers the entirety of CHE 110B with extra weight on NMR, gases, and statistical thermody- namics.  It is due on Canvas by 11:59 pm on Friday, December 9; late submissions within an hour of the deadline will forfeit extra credit from homework score, and submissions afterwards will receive a 0.  All times are Pacific. Here are the ground rules (see the FAQs on Canvas for a more complete list): No collaboration The exam is for you to work on individually.  Communicating with other people about the exam questions, whether or not they are in the class, is not allowed.  Using any online forums or Q&A services is expressly prohibited, and posting exam content to the internet or viewing answers submitted in response to exam material uploaded online will result in an SJA referral. Online resources You may use any resources that I have posted to  Canvas for the exam,  along with the course textbook and your own notes.   I  strongly  discourage searching the internet for general information, as this could potentially run afoul of the  “no collaboration” policy if you come across forum posts or similar that pertain to the exam material.  However,  Google etc. are not prohibited. You may also use online resources I have introduced to you,  such as the Basis Set Exchange, the Otterbein Symmetry Gallery, online character tables, and the NIST Chemistry Webbook. Questions If you need to ask a question about the exam, you may email me (my email address is at the bottom of the page) or send me a direct message on Discord.  I may post clarifications on Discord as well, so please keep an eye out. To avoid potentially being late due to network issues, do not wait until the last minute to upload your exam! I strongly recommend planning to finish the exam early enough that you have time to go somewhere with reliable internet should that become necessary.  We will not have class during the exam time, but I will be available in my office if you have questions. Good luck! 1)  This question refers to the paper “Spectroscopic detection of the stannylidene (H2 C −Sn and D2 C −Sn) molecule in the gas phase” (Smith et al. 2022, J. Chem. Phys. 157, 204306).  A copy of the paper is available on Canvas. (50 pts) 1.1) Write the full Hamiltonian, the Born-Oppenheimer electronic Hamiltonian, and the single-electron Hamiltonian for H2 C −Sn. You may use summation notation for the first two, but you must write out all terms in the single-electron Hamiltonian explicitly. (10 pts) 1.2) Figure 1 shows three singlet electronic states of H2C−−Sn. Using the information available in the figure, prove that the symmetries of the X˜, A˜, and B˜ states listed in the paper are correctly assigned. (5 pts) 1.3) Figure 1 also shows three possible electronic transitions: the B˜ − X˜ and B˜ − A˜ transitions are allowed, while the A˜ − X˜ transition is forbidden. Explain what is meant by “allowed” and “forbidden,” and prove that each respective transition is correctly assigned as allowed or forbid-den. (5 pts) 1.4)  Figure 2 shows simulated vibronic spectra for H2 C −Sn in absorption and emission.   Draw an energy level diagram that illustrates the labeled transitions in the spectra.   Ensure that your energy levels and transitions are appropriately labeled, and indicate the transitions with the same colors as used in the figure. (20 pts) 1.5)  The paper discusses both H2 C −Sn and D2 C −Sn. At a temperature of 300 K, which of these two molecules has a greater rotational partition function (or are they the same), and why? (5 pts) 1.6)  Determine the rotational contribution to the isochoric heat capacity for both H2C −Sn and D2C −Sn, assuming that each is an ideal gas: Cv,rot  = (∂T/∂⟨ ∂T Erot⟩)N,V ,    ⟨Erot〉(∂β/∂ ln ∂β Qrot)N,V . Which one is greater (or are they the same), and why? (5 pts) 2) NMR spectroscopy. (30 pts) 2.1) Deuterium is a spin-1 nucleus. Consider a system with 2 deuterium nuclei that are chemically inequivalent. The nuclear spin Hamiltonian (neglecting spin-spin coupling and quadupolar inter-actions) is: Hˆ = −γB(1 − σ1) ˆIz,1 − γB(1 − σ2) ˆIz,2 Draw and label an energy level diagram showing the energies of the |m1, m2⟩ states and allowed transitions for this system in a magnetic field. Let σ1 > σ2, and use different colors to indicate transitions for each nucleus. (10 pts) 2.2) Calculate the first-order correction to the energies of the |m1, m2⟩ states due to dipolar spin-spin coupling using perturbation theory. The spin-spin interaction is represented by: Vˆ = ℏ 2/hJ12 ˆI1 · ˆI2 Describe the splitting pattern that will be observed in the NMR spectrum (frequency and relative intensity). (5 pts) 2.3) When working with spin >1/2 nuclei, it is often convenient to use raising and lowering operators ˆI + and ˆI −: ˆI + = ˆIx + i ˆIy, ˆI − = ˆIx − i ˆIy When applied to a nuclear spin wavefunction |I, mI ⟩, these operators raise (ˆI +) or lower (ˆI −) the value of mI by 1: I ˆ± |I, mI ⟩ = ℏ p I(I + 1) − mI (mI ± 1)|I, mI ± 1⟩ Calculate the value of the ⟨1, −1| Vˆ |0, 0⟩ matrix element for the coupled deuterium atoms. Note that I = 1 for both deuterium nuclei, and the states are written as |m1, m2⟩ for simplicity. Hint: after expanding ˆI1· ˆI2 into x, y, and z, components, rewrite in terms of ˆI + 1 , ˆI − 1 , ˆI + 2 , and ˆI − 2 . (10 pts) 2.4) When two nuclei are equivalent, the nuclear spin wavefunction must transform. as an irreducible representation of the S2 permutation group. Determine the normalized symmetry-adapted linear combinations of nuclear spin wavefunctions |m1, m2⟩ for two equivalent deuterium nuclei. (5 pts) S2        E          (12) A          1            1 B          1          −1 3) Short answer (5 pts each) 3.1) In the paper “Virial equation of state as a new frontier for computational chemistry” (Schultz & Kofke 2022, J. Chem. Phys. 157, 190901), they write the virial equation of state as kBT/p = ρ + B2ρ 2 + B3ρ 3 + · · · + Bnρ n where p is pressure, ρ is number density, and Bn is nth virial coefficient. Show that this is equivalent to the virial equation of state introduced in class, and say how B2 is related to B2v. A copy of the paper is available on Canvas, but you do not need the paper to answer this question. 3.2) Explain why NMR spectra are conventionally plotted against chemical shift, not frequency. 3.3) Molecule A has a lower B2v than molecule B at room temperature. In the van der Waals equation of state, which molecule do you expect to have a greater value of a, and why? 3.4) Some basis sets use an “effective core potential” in which some of the core electrons are treated implicitly to generate a potential that shields the outer electrons from the nuclear charge. For the I2 molecule in the def2-TZVP basis set, determine how many orbitals are calculated, and how many explicitly-treated electrons occupy those orbitals.

$25.00 View

[SOLVED] MSc/MEng Data Mining and Machine Learning 2024 Lab 3 Speech Recognition using HTK R

MSc/MEng Data Mining and Machine Learning (2024) Lab 3 – Speech Recognition using HTK Introduction The purpose of this laboratory is to familiarise you with automatic speech recognition.  You will use the Hidden Markov Model Toolkit (HTK) to build a connected digit recognition system which takes an acoustic speech signal as input, performs training of the HMM for each digit and evaluate the performance of the system on a provided dataset.  The entire HTK consists of several tools (exe-files), each performing a specific operation, e.g., feature extraction, HMM training, etc.  Each tool is executed in the Command Prompt window by typing its name together with passing all the required  input  parameters.    The  exe-files  of  the   individual   HTK  tools  are   included   in  the LabASR.zip file to be downloaded from Canvas.  The zip-file also includes the manual for the HTK software – the manual is big but you are going to need it only occasionally and only as a reference in order to find out the meaning of (some of) the input/output parameters which are passed when using a specific HTK tool. Getting started Download the zip-file LabASR.zip from Canvas to your drive.  Open the zip-file and copy the entire directory structure to your drive.   Run  the  Command  Prompt Window by going to the Windows Start menu and typing ‘ cmd’ (no quotes).  Use the ‘ cd’ command to set your directory to the place you copied the unzipped file.  You are now set to start running some HTK tools. Dataset The dataset used in the laboratory contains recording of spoken digit sequences, where a digit is one of the following: one, two, three, four, five, six, seven, eight, nine, zero, oh.  The recordings are stored in .wav format.  The first letter in the filename of each .wav file indicates whether the recording is from a male (M) or a female (F) speaker.  The data is split into training part (folder TRAIN) and testing part (folder TEST).  In each (train/test) part, there is a set of clean (noise-free) recordings (folder CLEAN1) and a set of recordings corrupted by an additive noise (i.e., noise signal added to the clean signal) at the signal-to-noise ratio (SNR) of 15 dB  and  10 dB (folder N1 SNR15, N1_SNR10,  respectively).  The additive noise illustrates the effect of a background ambient noise in practice. Viewing the signal In this initial exercise you will practice the use of the HList tool.  This tool allows you to view wav-files or files containing features extracted from wav-files  (the feature  extraction  can  be performed using the HCopy tool which will be the subject of the next section).  Typing the below gives the values of samples in the wav-file and these are stored in the file logHList_wav: HTK3.2bin\HList -h -C config/config_HList_wav dataAurora2/wavLabDMML/TRAIN/CLEAN1/FAC_13A.wav > logHList_wav You can examine the file containing the MFCC features (after you have created them as described in the next section) by typing: HTK3.2bin\HList -h -C config/config_HList_mfcc dataAurora2/specLabDMML/TRAIN/CLEAN1/FAC_13A.mfcc > logHList_mfcc Feature extraction The HCopy tool enables to extract a sequence of feature vectors from a given wav-file.   It  is capable  of extracting several  different types  of features,  e.g.,  logarithm  filter-bank  energies, MFCCs, etc.  By typing the below, you can convert the MAE_12A.wav file into a file with the same name but extension  .mfcc which contains the MFCC features (note that the feature file will be located in a different directory): HTK3.2bin\HCopy -C config/config_HCopy_MFCC_E dataAurora2/wavLabDMML/TRAIN/CLEAN1/FAC_13A.wav dataAurora2/specLabDMML/TRAIN/CLEAN1/FAC_13A.mfcc The HCopy tool can be used to extract features for a set of files listed in a given text-file.  This can be         performed         by         using         the          HCopy    as          below,          where         the listTrainHCopy_LabDMML_CLEAN1.scp is a text-file containing the list of files (with a full path) to be processed.  This file is located in the list directory.  Open and view this file and you can see that each line contains name of two files (with a full path) – the first is the file to be used as the input and the second is the file to be used as the output.  You will need to modify the path here to be the path where your data are located.  After you have done the path modifications, type: HTK3.2bin\HCopy -C config/config_HCopy_MFCC_E –S list/listTrainHCopy_LabDMML_CLEAN1.scp The option -S is used to specify a script. file name (listTrainHCopy_LabDMML_CLEAN1.scp) that contains the list of files to be converted. Building the digit recognition system – parameter set-up In the previous section, we have converted a set of wav-files into files containing the features. Now, you start to build your digit recognition system.  You will need the following: -     Vocabulary list – file wordList_noSP located under the lib directory – this contains the list of words the recogniser is going to be able to recognise.  A model will be built for each vocabulary word. -     Dictionary (or pronunciation model) – file wordDict located under the lib directory – this defines the mapping of words to acoustic units, i.e., how model of each vocabulary word is built using a single (or a sequence of concatenated) HMMs.  Since we are using in this  example  HMMs  of  whole  words,  the   dictionary  contains  a   repetition  of  each vocabulary word.  Note that this would be different in a case of building HMMs of each phoneme. -     Language model (or grammar) – file wordNetwork located under the lib directory – this defines (in a specific format) the set of possible sentences that can be recognised, as well as  their  relative  prior  probabilities.     If  needed,  it  can  be  written  by  hand  or  more conveniently using the tool HParse. -     Features  extracted  for  the  training  /  testing  data –  are  located  under  dataAurora2 directory. -     Label files for the training / testing data – file label_LabDMML_noSP.mlf located under the label directory is to be used in the first instance.  You can open this text file and see that it contains the labels (i.e., transcription of what have been spoken in terms of the digits) for all the training data. -     Prototype HMM – file proto_s1d13_st8m1_LabDMML_MFCC_E located under the lib directory.  You can open this text file and see that it contains a definition of the type of HMM to be used – it defines the dimension of the features, the number of states in the HMM, initial values for means, variances and weights for each state (these values are indicative  only  –  they  inform  about  the  structure  of  the  HMM),  and  the  transition probability   matrix  which  determines  the   possible  transitions   between  states  (the transitions assigned to zero will not be possible). -     Configuration file for the individual tools – each tool may have different configuration file (containing the parameters of the processing to be performed). Building the digit recognition system – training the HMMs 1.  Create the directory hmm0 under hmmsTrained.  The initial parameters of HMMs are going to be estimated using the tool HCompV.   By executing the following, the initially trained HMM parameters   will   be   located   in   the   file   hmmdef (and   vFloors)   under   the   directory hmmsTrained/hmm0.        Note     that    you    will     need    to     modify    the     path    in     the listTrainFullPath_LabDMML_CLEAN1.scp file. HTK3.2bin\HCompV -C config/config_train_MFCC_E -o hmmdef -f 0.01 -m -S list/listTrainFullPath_LabDMML_CLEAN1.scp -M hmmsTrained/hmm0 lib/proto_s1d13_st8m1_LabDMML_MFCC_E 2.  Now you will create 2 files (could be done manually but you are provided exe-files which do the work automatically for you). Type the below – it will create file with name models containing the HMM definition of all the 11 digits and the silence model.  The models file could be created manually by simply copying the  content  of  hmmdef several times  (for  each  vocabulary  unit)  and  replacing  the  name according to the vocabulary. HTK3.2bin\models_1mixsil hmmsTrained/hmm0/hmmdef hmmsTrained/hmm0/models Type the below, which creates the so-called macro-file having basically the same content as the   file vFloors but slightly modified structure.  The value 13 indicates the dimension and MFCC_E the type of features – you will need to modify these when using different features/dimension. HTK3.2bin\macro 13 MFCC_E hmmsTrained/hmm0/vFloors hmmsTrained/hmm0/macros 3.  The next step is to run several iterations of the Baum-Welch training procedure.  This can be done using the tool HERest.  Among the input parameters for this tool is the input directory containing the current HMM parameters (which is now hmmsTrained/hmm0) and the output directory    containing     the      new     re-estimated     HMM     parameters     (which     is     now hmmsTrained/hmm1).  Thus, you need to create the new directory hmm1 and then run: HTK3.2bin\HERest -C config/config_train_MFCC_E -I label/label_LabDMML_noSP.mlf -t 250.0 150.0 1000.0 -S list/listTrainFullPath_LabDMML_CLEAN1.scp -H hmmsTrained/hmm0/macros -H hmmsTrained/hmm0/models -M hmmsTrained/hmm1 lib/wordList_noSP Altogether,  perform.  three  iterations  of  the  HERest.    Before  each  iteration,  make  a  new directory (hmm1, hmm2, and hmm3) where the newly trained HMMs are going to be stored.  At each iteration, you should not forget to change the corresponding input and output directory names in the above HERest command – use the output directory from the current iteration as the input directory in the next iteration. 4.  Now create two new directories hmm4 and hmm5. Then copy the content of the directory hmm3 into the hmm4 directory. 5.  Create the model for a short-pause sp by performing the two commands as below: HTK3.2bin\spmodel_gen hmmsTrained/hmm3/models hmmsTrained/hmm4/models HTK3.2bin\HHEd -H hmmsTrained/hmm4/macros -H hmmsTrained/hmm4/models -M hmmsTrained/hmm5 lib/tieSILandSP_LabDMML.hed lib/wordList_withSP 6.  Perform another three iterations of the HERest (with sp this time) – before each iteration, make a new directory where the newly trained HMMs will be stored. HTK3.2bin\HERest -C config/config_train_MFCC_E -I label/label_LabDMML_withSP.mlf -t 250.0 150.0 1000.0 -S list/listTrainFullPath_LabDMML_CLEAN1.scp -H hmmsTrained/hmm5/macros -H hmmsTrained/hmm5/models -M hmmsTrained/hmm6 lib/wordList_withSP Training finished! – you have now obtained trained models of digits in the folder hmm8, each modelled by 10 state HMM with a single Gaussian PDF with diagonal covariance matrices.  Let’s go to do testing (recognition). Building the digit recognition system – recognition 1.  The tool HVite is to be used for testing of the recognition system.  This performs the Viterbi decoding  and  gives  the  sequence  of  models  which  are  most  likely  to  produce  the  given unknown utterance.  Among the input parameters to the HVite tool are the trained HMMs and the list of testing utterances (from the testing data directory).   First, you need to extract features from the testing wav-files using the HCopy tool as described at the beginning of the lab (when you created features for the training utterances).  Then, you can run the Viterbi decoding using: HTK3.2bin\HVite -H hmmsTrained/hmm8/macros -H hmmsTrained/hmm8/models -S list/listTestFullPath_LabDMML_CLEAN1.scp -C config/config_test_MFCC_E -w lib/wordNetwork -i result/result.mlf -p 0 -s 0.0 lib/wordDict lib/wordList_withSP 2.  Tool HResults is to be used for analysing the results of the HVite and providing the final recognition accuracy of the system.  The -e option will cause that sil and sp models will be omitted from counts for the overall recognition performance. HTK3.2bin\HResults -e "???" sil -e "???" sp -I label/labelTest_LabDMML.mlf lib/wordList_withSP result/result.mlf >> result/recognitionFinalResult.res HResults provides results on sentence (SENT) level and Word (WORD) level – these indicate how well the entire sentences or words were recognised.  In the results, the ‘H’, ‘D’, ‘S’, ‘I’, and ‘N’  denote  the   number  of  hits,  deletions,  substitutions,  insertions  and  total   number  of words/sentences, respectively.  If there is a large difference between the number of deletions (‘D’) and insertions (‘I’), this indicates that the recognition system is not well balanced.  To improve this balance, there is a parameter referred to as -p flag in the HVite command – this is word insertion penalty (WIP), a penalty on transiting from one model to other model.  The WIP can be used to balance the number of deletions and insertions.   If needed, change the value from 0 to some other positive or negative value (e.g., in steps of 10). Perl scripts In the Lab directory in Canvas you can find the file perlScripts_LabASR.zip – this contains several   Perl   scripts   which   in   a   neat   way   incorporate   all  the   above   commands.      The ASR_LabDMML_MFCC_E.pl script does all the above (feature extraction, training and testing) and the ASR_LabDMML_onlyTest_MFCC_E.pl performs testing only (assuming the training has been performed).  You will need to change paths inside the Perl scripts.  Then you can run the first Perl script by typing perl ASR_LabDMML_MFCC_E.pl in the Command Prompt window – it should perform the feature extraction, the entire training and testing.   For  a  reference,  an introduction to Perl is located in the Lab directory in Canvas. Lab Report Tasks: For all the tasks below, if needed, modify the –p flag (in HVite) to achieve reasonable balance of the number of deletions and insertions. 1.  Explore the effect of delta and delta-delta features.  Using the provided Perl script, modify the recognition system developed above such that it uses not only the static MFCC features (i.e., MFCC E)  but also the delta and delta-delta features  (i.e., MFCC E D A).   You  will  need  to perform modifications at several places.   In  the  HCopy config  modify the  TARGETKIND to MFCC_E_D_A and set the DELTAWINDOW=3 and ACCWINDOW=2.  The MFCC_E_D_A features will not be 13 dimensional (as were the MFCC_E features) but 39 dimensional – so, you will need to make modifications at places where the feature dimension information appears.  You will also need to modify the TARGETKIND in config_train and config_test and will need to use the proto_s1d39_st8m1_LabDMML_MFCC E D A.  Train the system using the clean training data.  Perform experimental evaluations on clean test data.  Report and discuss your results. [20 marks] 2.  Investigate the effect of using Gaussian mixture state PDF modelling.  Modify the provided Perl scripts (and configuration files) to develop a recognition system that uses the MFCC_E_D_A features and employs 3 Gaussian mixture components per state.  Train the system using the clean training data.  Perform. experimental evaluations on clean testing data and compare the results with those obtained using a single Gaussian per state as obtained from Task 1.  Report and discuss your results. [20 marks] 3.  Explore the effect of noise. [40 marks] a.     Perform. experimental evaluations of the recognition system developed under Task 2 separately on each provided noisy test data (N1_SNR10, N1_SNR15). b.    Then  develop  a  new  system  –  this  should  be  as  the system  in Task  2  (i.e.,  using MFCC E D A features  and  3  Gaussian  mixture  components)  but  trained  using  a combined set of all the clean and noisy training data together – to do this, you will need to create a new list file containing all the filenames of all the clean and noisy training data.   Perform evaluations of this system separately on clean and on each noisy test data (N1_SNR10, N1_SNR15). Report, compare and discuss your results. 4.  Consider that you have available the trained system from Task 3b (in a case you did not do this task you may consider the system from Task 2).  Suggest how you could (in a similar concept as  used  in Task  3b) try to  improve  the  performance  of the  system for ‘female’  speakers. Develop the modified system and perform. suitable experiments on noisy test data  N1_SNR10. Report, compare and discuss your results. [20 marks] Lab Report Submission You should  report concisely on each of the above tasks.   Describe clearly what changes you needed to make to perform. the task and discuss the obtained results.  Your report from this lab is expected to be no longer than 7 pages and the submission is through Canvas.  Standard penalty of 5% per day applies for late submissions.    

$25.00 View

[SOLVED] ELEC0021 MATLAB AND SIMULINK BASICS FOR CONTROL SYSTEMS DESIGN Matlab

ELEC0021: MATLAB AND SIMULINK BASICS FOR CONTROL SYSTEMS DESIGN 1. INTRODUCTION A control systems CAD package aids control systems design by automatically calculating the graphical displays that designers use. It does all the numerical work leaving you free to concentrate on the important design decisions. MATLAB® and its Control Systems toolbox provides an ideal environment. This note gives you an introduction to the basic features of the Control Systems toolbox. You are urged to become familiar with the material outlined in this note because it will be of help in the laboratory exercise as well as in the study of the material in the problem sheets. 2. GETTING STARTED Launching MATLAB will open the Matlab terminal window, as well as several other support windows. The MATLAB prompt symbol is ». For help on any command simply type » help   on the terminal. For instance, for information about polynomial roots type » help roots MATLAB distinguishes between lower and upper case, and they are not interchangeable. For example, to leave MATLAB type (in lower case): » quit 3. CONTROL SYSTEMS IN MATLAB Vectors and arrays:  The basic data structures in MATLAB are vectors and matrices. Square brackets [ ] are used to define vectors. Round brackets ( ), by contrast, are used to indicate the arguments of a function. For example in the statement » [re, im] = nyquist(num,den,w) the function nyquist uses vectors num, den and w as data (the arguments) from which to calculate or define vectors re and im. Vectors can also be defined recursively, for example the statement below sets up a vector t=[0  0.1  0.2  0.3  .....  9.8  9.9  10.0] » t = 0:0.1:10 MATLAB usually echoes the result of a calculation to the screen. If you want to suppress the echo the statement can be ended with a semi-colon, as shown below » t = 0:0.1:10; Example - closed loop poles:  A control systems example presented in the lectures is a d.c. motor represented by the following open loop transfer function: Go(s) = s(s + 1)(s + 10)/50 The numerator polynomial has just one element. The numerator can be given any name you wish; in the example below it is called num: » num = 50; The denominator polynomial can be specified either as a vector of roots or else as a vector of polynomial coefficients. MATLAB uses the convention in its Control Systems toolbox that vectors of roots are column vectors and vectors of polynomial coefficients are row vectors. Now, s(s + 1)(s + 10) is equal to s3 + 11s2 + 10s. The MATLAB routines need the denominator in polynomial form. so we could write: » den = [1  11 10  0]; If the transfer function is in factored form, however, you can do: » den = [0 -1 -10]; » den = poly(den); Try this to establish that the polynomial coefficients are calculated correctly. The inverse of the poly function is roots which calculates the roots of polynomials. It is possible to determine the closed loop poles of the d.c. motor using the roots function. The characteristic equation is 1 + KGo(s), and the closed loop poles are those values of s that satisfy the equation 1 + KGo(s) = 0. That is to say, we need to determine the roots of 1 + KGo(s). When K = 1 the polynomial coefficients of 1 + KGo(s) are [1  11  10  50] and the closed loop poles are therefore determined as: » dencl = [1  11  10  50]; » clpoles = roots(dencl) The vector clpoles contains the roots of the characteristic equation, which are the values of s where the closed loop poles are located on the s-plane. The results are approximately -10.5, and . Make sure you understand the time-domain signals that correspond to these closed loop poles. It is possible to create a plot on the s-plane showing how the closed loop pole positions change as the gain, K, varies. It is called a root-locus plot and uses the open loop transfer function (num and den) to determine the closed loop poles: » k = 0:0.5:20; » rk=rlocus(num,den,k); » plot(rk,'+'), xlabel('Re s'), ylabel('Im s'), grid Make sure that you know which of the '+' symbols corresponds to which value of the gain k. The gain varies in steps of 0.5 from 0 to 20. The root locus starts when k = 0 at the open loop pole positions. Use of the tf command: The commands the Control Systems toolbox can use a transfer function description of the system. To create a transfer function called sysol from num and den type: » sysol = tf(num,den)  % omit the semicolon so that you can see the transfer function. Once the system has been defined using the tf command the root locus plot can be called like this. Note that when called with no output argument on the left hand side MATLAB automatically plots the results: » rlocus(sysol,k) Nyquist, Bode and Nichols plots:  Nyquist and Bode plots are parameterised by angular frequency () so before the functions are called it is necessary to generate the vector of angular frequencies. The logspace function is suitable. Type help logspace to determine its operation. The plots generated in this section are of the open loop system defined in the transfer function sysol. » w = logspace(0,2); » [re, im] = nyquist(num,den,w); » re = re(:);  im = im(:);   % See footnote, next page » plot(re,im,'+'), grid An alternative way to call the Nyquist plot is: » nyquist(sysol,w) MATLAB automatically plots a graph when this form. is in use and it also labels the low frequency end of the plot.  To generate a Bode plot type: » w = logspace(-1,2,100); » [mod, arg] = bode(sysol,w); mod = mod(:); arg = arg(:); » subplot(211), loglog(w,mod), title('Magnitude plot'), xlabel('w') » subplot(212), semilogx(w,arg), title('Phase plot'), xlabel('w'), ylabel('deg') The Control Systems toolbox also calculates gain margin (gm), the phase margin (pm), the angular frequency where the open loop gain is 1 (wcp) and the frequency where the open loop phase is  (wcg): » [gm, pm, wcg, wcp] = margin(sysol) The gain margin is the factor by which the gain must be boosted to give unity gain at the angular frequency wcg. If the gain margin of the open loop is less than unity then the closed loop system is unstable. To create a Nichols plot type » nichols(sysol,w); » ngrid  % ngrid superimposes the M-contours onto the Nichols plot 4. CONTROL SYSTEMS IN SIMULINK Simulink is a graphical tool that allows us to simulate feedback control systems. Open Simulink by simply typing simulink at the MATLAB prompt. Once Simulink has loaded, create a new model by pressing CTRL+N (or click the “New Model” button). You can begin placing components from the component library by dragging and dropping them in the model space. You can make connections by dragging the connection wires to the input junctions. You can zoom in/out by pressing CTRL+“+” / CTRL+ “–”. You can change the simulation time by setting the value in the related box, and then run your model by pressing the green “run” button. Components: You will only need to be concerned with a small fraction of Simulink's component library. In particular, you should be familiar with the following: · “Continuous” library: Integrator, Transfer Fcn · “Math Operations” library: Gain, Sum, Trigonometric Function ·  “Signal Routing” library: Mux, used to multiplex signals together in order to plot several on one graph · “Sinks” library: Scope (used for viewing the system outout), To Workspace (used to transfer a signal to Matlab) or use the Simulation Data Inspector · “Sources” library: Ramp, Sine Wave, Step (which generate the corresponding signals) Modifications of Components: You can modify component properties by clicking on them and changing the default settings given by Simulink. For example: · double-clicking on the “Gain” component, you can alter the gain parameter (you can also use expressions, such as 10.3/2.5) · double-clicking on the “Sum” component you can change the signs to |+ – · double-clicking the “Scope” to open the scope output; right-clicking on the scope output and selecting “Autoscale” will automatically scale the output range of the scope · double-clicking on the “To Workspace” component and changing the “Save format” to “Array”; after you run the simulation, you will then be getting the outputs in the workspace variable “simout”   

$25.00 View

[SOLVED] FB5731 Business Analytics and Decision Modeling Java

FB5731 Business Analytics and Decision Modeling Assignment 1 Decision Analysis Problem: Real Estate Investment Sarah and John Anderson are considering purchasing a vacation property in the town of Banff, Alberta, Canada. The asking price for the property is CA$800,000, and it has been on the market for only a day. Their real estate agent has informed them that there are multiple interested buyers who have also viewed the property. The agent has advised Sarah and John that if they decide to make an offer, they should offer very close to the asking price of $800,000. The agent also mentioned that if there are compet-ing offers, the seller may ask the potential buyers to submit their final offers the following day. Sarah and John have decided to construct a decision tree to help with this decision. They assumed that the “fair market value” of the property is $800,000. They assigned an “emotional value” of $20,000 if they are successful in purchasing the property. That is, whereas the fair market value is $800,000, the property is worth $820,000 to Sarah and John. Thus, if they were to be successful in purchasing the house for $780,000, the payoff of this outcome would be $40,000. Of course, if they were not successful in purchasing the property, the payoff would be simply $0. Sarah has also assigned a probability of 0.30 that they will be the only buyers on the property. Sarah and John have decided to consider making one of three offers: $780,000, $800,000, or $810,000. They estimate that if they are the only buyers, the probability that an offer of $780,000 is accepted is 0.40, the probability that an offer of $800,000 is accepted is 0.60, and the probability that an offer of $810,000 is accepted is 0.90. However, if there are other buyers, the seller may ask them to submit a final offer the following day. In this scenario, Sarah and John will have to decide whether to withdraw, submit the same offer, or increase the offer by $10,000. They feel that in the event of multiple bids, the probability that an offer of $780,000 is accepted is 0.20, the probability that an offer of $790,000 is accepted is 0.30, the probability that an offer of $800,000 is accepted is 0.50, the probability that an offer of $810,000 is accepted is 0.70, and the probability that an offer of $820,000 is accepted is 0.80. (a) Structure Sarah and John’s problem as a decision tree. (b) Solve for Sarah and John’s optimal decision strategy. You can draw the decision tree (including the appropriate decision nodes, state-of-nature nodes, probabilities, payoffs, EMVs, etc.), and scan it to a word document. You can also use the Power-Point template to construct the decision tree. Assignment 3 Regression Analysis Problem: VARMAX Realty In order to help clients determine the price at which their house is likely to sell, VARMAX Realty gathered a sample of 150 purchase transactions during a recent three-month period. Column 1 of the data shows the selling price of the home in thousands of dollars, Column 2 shows the number of square feet (in thousands), Column 3 shows the number of bathrooms, Column 4 is the lot size (the area of the land), and Column 5 is the median household income (in thousands) of the area where the home is located. Analyze the data in home prices.csv and answer the following questions. Feel free to choose your tools. You can use Excel or AI+Rstudio. If you choose AI, be careful about the hallucination issues. 1. (⋆) What are the mean and median home price in the data set? 2. (⋆) Make a histogram of the response variable Price. It should have 15 bins, with “Home Price” as the label of the x -axis and “Count” as the label of the y-axis. Set “grey” as the color of the bins. 3. (⋆) Fit a multiple regression model, using all four explanatory variables. Include the model summary, i.e., the estimated coefficients, p-value, F-test results, and adjusted R squared in your submission. 4. (⋆⋆) Does the estimated model appear to meet the conditions of multiple regression model? (Check model conditions: residual plots, normal quantile plot.) 5. (⋆) Does this model explain statistically significant variation in the prices of homes? Give your reasons. 6. (⋆) Interpret the estimated coefficient for Sq Feet. What does this coefficient mean? What does its p-value mean? 7. (⋆) A homeowner wants to sell her home with: Sq Feet=3, Bathrooms=3, Lot Size=9, Median.Income= 10. Give a 95% prediction interval for the price of her home. 8. (⋆ ⋆ ⋆) A homeowner asked the realtor if she should spend $40,000 to convert a walk-in closet into a small bathroom in order to increase the sale price of her home. What does your analysis indicate? VARMAX Realty rates the conditions of the homes from A to E, with A being the best condition and E being the worst. The rating data are shown in Column 6. Please include the general condition rating in your model by creating dummy variables and answer the following questions. The IF function in Excel may help you create the dummy variables (ask Poe). 9. (⋆⋆) How many dummy variables do you need? Is it worth including the general con-dition rating in the regression model? Why? 10. (⋆ ⋆ ⋆) Interpret the coefficients of all the dummy variables (use α = 0.05). Assignment 4 Linear Programming Problem: Advertising Model Extensions During our lecture, we formulate the advertising problem on determining how many ads to place on each social media platform. and solve it using the Solver in Excel. Gourmet Treats really has two competing objectives: (i) to obtain as many exposures as possible, and (ii) to keep the total advertising cost as low as possible. In the in-class exercise, we decided to minimize total cost and constrain the exposures to be at least as large as a required level. In this assignment, you will consider an alternative, which is to maximize the total number of excess exposures while the total cost does not exceed the budget $2 million. Here, excess exposures are those above the minimal required level. excess exposures = actual exposures − required exposures 1. (⋆⋆) Write down the new LP formulation. Note that your objective is to maximize the total number of excess exposures. You have a new constraint on the budget, in addition to the minimum exposure requirements. 2. (⋆⋆) Modify the Excel template accordingly, and solve it using the Excel solver. Include the answer report and sensitivity report here. 3. (⋆) If the budget can be increased by $500,000, how much extra exposure will Gourmet Treats gain under the optimal solution? In addition to the constraints already in the current advertising model, suppose Gourmet Treats also wants to obtain at least 180 million exposures to men and at least 160 million exposures to women. 4. (⋆ ⋆ ⋆) Does the current optimal solution satisfy these constraints? If not, modify the model as necessary, and rerun Solver to solve it. Include your results here. Suppose Gourmet Treats replaced the gender-specific minimum exposure requirements (that is, 180 million exposures to men and 160 million exposures to women) by a new constraint: it shouldn’t place any more than 10 ads on any given platform. 5. (⋆⋆) Modify the advertising model appropriately to incorporate this constraint, and then reoptimize. Include the answer report and sensitivity report here.

$25.00 View

[SOLVED] N1569 Financial Risk Management Workshop 5 Statistics

N1569 Workshop 5 1.  Sketch the cash flows for: (i) Receipt of a 5-year interest-only (not interest + pay-down) loan of $10m with a fixed annual interest payment of 4%. (ii)  Purchase of a 3-year bond with a coupon of 5% paid annually. 2. You have a cash flow of $6m at 7 years, when the 7-year interest rate is 5% with a volatility of 45 bps.  You want to map this to two vertices:  the 5-year interest rate, which is 4% with a volatility of 50bps; and the 10-year rate which is 6% with a volatility of 40bps.  The correlation between the 10-year and 5-year interest rates is 0.75. Use the appropriate Excel spreadsheet to: (i) Find the PV (in $m) and the PV01 (in $) of the original cash flow. (ii)  How much of this PV is mapped to each vertex, in order to keep PV constant and to keep the volatility of the mapped cash flow the same as the volatility of the unmapped cash flow? (iii)  Calculate the PV01 of the mapped cash flow at each vertex. 3.  Use Excel AI and the file BoE spot curve.xlsx to reproduce the time series plot shown on slide 30. 4.  Use the Excel Workbook 5 to calculate normal and historical VaR for a gilts port- folio which has equal PV01 at all maturities, for different significance levels and holding periods.

$25.00 View

[SOLVED] COMM 363 Marketing Analysis 2024W1 R

COURSE INFORMATION Course title:               Marketing Analysis Course code:              COMM 363                                         Credits:                3 Session and term:     2024W1                                               Class time &       Sec 101: T/T 9:30-11:00 in HA 037 Sections:                     101, 102, 103                                      location:              Sec 102: T/T 11:00-12:30 in HA 037 Sec 103: T/T 2:00-3:30 in HA 037 Course duration:       Sep 5 – Dec 5, 2024                           Pre-requisites:   N/A Division:                      Marketing & Behavioral Science    Co-requisites:    N/A COURSE DESCRIPTION This course uses case analysis and a marketing simulation game to give students experience making marketing strategy decisions. Students perform. a series of case analyses in which they learn analytical frameworks that are used to guide marketing strategy decisions. Topics include market segmentation, targeting and positioning as well as decisions related to pricing, distribution, and communications. Students also play a marketing strategy simulation game in which they compete against other teams to implement marketing strategies. The objective is to win the game by maximizing return on investment. The simulation allows students to appreciate the connections between marketing and financial performance and learn that strategy decisions must be sensitive to changes in market conditions to be effective.  In summary, the course is focused on developing your analytical skills and developing your ability as a decision maker. COURSE FORMAT Case classes involve class discussion of your analysis and decision. Cases are used to learn how to analyze real-world problems and make decisions as a manager.  Your job is to assume the role of the decision maker in the case, apply the course concepts to analyze the information provided in the case, and present evidenced-based arguments in class to determine the best course of action.  It is learning- by-doing and will give you practice making decisions in real-world situations. LEARNING OBJECTIVES After completing the course, students will have the knowledge and skills to: 1.    Integrate concepts from marketing and finance to perform a comprehensive analysis of an organization’s performance in a market. 2.    Implement a variety of analytical approaches to inform and guide marketing strategy decisions in different market conditions. 3.    Develop teamwork and communication skills through case discussion and team decision making. 4.    Understand the complexity of the problems/scenarios encountered by the marketing function. SUSTAINABLE DEVELOPMENT GOALS (SDGS) At UBC Sauder, we are committed to responsible business practices that can have transformative impacts on society. One of the ways we are reinforcing our commitment to responsible business is by showcasing relevant content in our courses via the lens of theUnited Nations Sustainable Development Goals. In this course, we will touch on topics that relate to the following goals: Sustainable Development Goal Description of how and when the goal is covered in the course. GOAL 3: Good Health and Well-being   Two of the cases we will analyze in the course are examples of market opportunities created by consumers motivated by health and wellness goals. The Brita case is an example of a brand that uses health and wellness claims (removal of water impurities) to position its product, and highlights how differences in access to clean drinking water can  vary by region and consequently affect market demand.  The Metabical case ask students to develop a go-to-market strategy for a prescription weight-loss drug and is controversial in how it presents market segmentation data showing differences in consumer attitudes toward weight loss and health. Goal 9: Industry, Innovation and Infrastructure   The Markstrat simulation demonstrates the importance of innovation in satisfying market demand.  Students make choices about the timing of  new  product  development  projects  and  how  the  timing  of  new product introductions affect market demand and competition. Goal 10: Reduced Inequality   Throughout the course we will discuss how market segmentation can result in consumers being treated differently based on demographic   and psychographic differences, which can create disparities and inequalities that violate social justice and equality. GOAL 12: Responsible Consumption and Production   Throughout the course we will discuss how marketing and consumerism can have a negative impact on responsible consumption and production.  The Markstrat simulation also demonstrates the negative financial consequences of inaccurate demand forecasting and over-production. ASSESSMENTS   Summary   Component Weight Class Participation (Individual) 10% Online Pre-Assessments (Individual) 15% Case Submission (Individual) 20% Markstrat Simulation (Teams) 30% (Team Performance 15%, Reflective Report 15%) Final Case Exam (Individual) 25% Total 100% Details of Assessments Class Participation (10%): Students are evaluated on participation in every class.  Your participation grade is based entirely on the extent to which your contribution to class discussion impacts the learning of others. It is about your impact on the learning of others rather than the frequency of your participation.  Asking an intriguing question or presenting your analysis is more impactful than simply answering a question.  Respecting your classmates is paramount and I value quality over quantity. Grading Scale for Class Participation: 0 – Absent or late to class, or student video not on during Zoom sessions. 5 - Present but does not participate. 6 - Participates with basic information such as case facts. 7 – Offers an opinion or asks/answers a basic question. 8 – Engages in a meaningful discussion with other members of the class. 9 – Shares an analysis using data or evidence from the case or reading. 10 – Provides insight or asks a question that is instrumental in advancing understanding. Online Pre-Assessments (15%): Online pre-assessments are short quizzes that students complete on Canvas before the start of class to  ensure that everyone has read the assigned case. This will ensure that everyone is prepared for class so that class time can be dedicated to higher-level discussion. Pre-assessments must be completed before the start of class or receive a grade of zero. Case Submission (30%) The case submission is a comprehensive case dealing with issues that we address up to that point in the course.  You will be asked to answer a number of specific questions that require you to analyze several aspects of the case and make a decision. The case and questions will be posted on Canvas. The case submission is an individual assignment. You may not consult with your classmates or any other individuals.  We will discuss the case in class. Consequently, late submissions cannot be accepted and will receive a grade of zero. Final Case Exam (25%) The final case exam is a comprehensive case dealing with issues that we address throughout the course. Similar to the case submission, you will be asked to answer a number of specific questions that require you to analyze several aspects of the case and make a decision. The final case exam is an individual deliverable and will take place during the December exam period. MarkStrat Simulation (30%) Team Performance (15%), Reflective Report (15%) MarkStrat is a marketing strategy simulation game in which you compete against other teams to implement marketing strategies. The objective is to win the game by maximizing return on investment. It is a dynamic game meaning that the actions of competitors influence what happens in the game. Consequently, a successful strategy is dependent on analyzing and anticipating what is happening in the market in each round and trying to out-smart your competitors. Teams will be randomly assigned with 3-5 people per team.  There will be two rounds of trial decisions   at the start of the course so that teams can familiarize themselves with the game and making decisions. The game will then be re-set and teams will compete over the course of 9 rounds with the winner determined at the end of the 9th  round.  Each round requires teams to make decisions on target market selection, positioning, product line development, distribution, pricing and promotion.  Because MarkStrat represents a significant portion of the course, one class each week will be allocated for teams to do their planning, analysis and decisions for each round (please refer to the class schedule in this outline). Team Performance (15%) The decisions you make in each round affect the outcomes of the company in a competitive industry.   The performance of your firm (your grade) will be assessed at the end of the simulation based on your team’s Stock Price Index relative to other teams in industry. The grade range will reflect the relative performance of the teams: top performing teams tend to earn  A’s, average teams tend to earn B’s, and poorly performing teams tend to earn C’s or D’s depending on how poorly they perform.  While relative rank in your industry is a fun scorekeeper, in the real world,    stock price and other performance indicators are what the market cares most about. Reflective Report (15%) The purpose of your report is to provide an honest and transparent account of what your team members learned from your experience in the game – not only with respect to marketing analysis but also with respect to team dynamics, communication, decision making style, and conflict resolution. Note that the grading of the report is completely independent of your team’s performance evaluation. Technology Requirements: Team members are encouraged to bring laptops to team meetings so you can access Markstrat. Team Norms and Peer Evaluation The individual grades of team members for the team performance (15%) as well as the reflective learning report (15%) are subject to a peer-to-peer evaluation from each of your team members.  Team  work is a central component of the course and we encourage you to make friends and have fun.  Being a good team member involves: a.    Respecting your classmates b.   Acting in good faith c.    Attending team meetings d.    Meeting deadlines e.    Producing quality work that meets the team’s standards f.     Pulling your weight The peer evaluation form. at the end of this course outline will be used to assess the contribution of each team member to the MarkStrat simulation (the grade for the team performance as well as the reflective  learning report).  Each student will be evaluated anonymously by their team members at the mid-point of the simulation and again at the end of the simulation.  Peer assessments at the mid-point of the simulation are strictly for feedback purposes and will not affect student grades.  This ensures that students are made aware of any concerns their team members may have about their performance early enough to make adjustments and not have their grade affected. Peer assessments at the end of the simulation will result in downward grade adjustments in cases where a student receives a score of 1 (Problematic) or 2 (Insufficient) on any criterion from more than one team member. The final question of the peer evaluation asks: all things considered, what percentage of the team’s grade does the individual deserve?  I will take the average peer score for each student and multiply it by the team’s grade to arrive at the student's grade.  For example, if a team receives a grade of 80% (an A–) and a member of the team receives an average peer score of 75% from their team members, that team member’s individual grade will be 75% x 80% = 60% (a “C” rather than an “A–“).

$25.00 View

[SOLVED] MH900 Epidemiology and Statistics 2024/25

Assessment for MH900 Epidemiology and Statistics 2024/25 Students should complete both sections Marks are allocated as follows: Section 1: Epidemiology 50 marks Section 2: Statistics Question 1: 25 marks Question 2: 25 marks The pass mark is 50% overall. Submission date: 11 December 2024 – before 12 noon. Electronic submission procedure: please submit two files, one for epidemiology and one for statistics. Extension of submission date Extensions are not available routinely - if you are not going to be able to hand your assignment in on time it is essential that you make an on-line written request before the submission date, outlining the extenuating circumstances, using the standard on-line extension request form. Late submission Assignments not received by the deadline, or by any agreed extension date will be penalised at the rate of 5 marks per day late, in line with University policy. Section 1: Epidemiology Prepare a study design to answer a research question within one of the following topic areas - you must formulate an appropriate question. NOTE: The topics listed are the same as used for group work. Whilst most students chose to build on the topic covered by their group presented during the taught module, you may elect to change topic and tackle one of the other questions or, by prior agreement, your own question. You must design an analytical epidemiological quantitative study – so any research questions must be ones that can be addressed by this approach. IMPORTANT: Even though this is based on your group work, you MUST NOT directly use material from your presentation. It MUST be paraphrased. Failure to do so will be flagged as collusion and result in referral to the Academic Integrity Committee. Guidelines As in the group work guidelines, the study design should address the points listed below. Examiners will use this guideline in assessing work and you will lose marks if you have failed to address any of these points. Your total word limit for this section (Epidemiology) is 2000 words (+/- 10%, so 1800 to 2200 words). Tables and references are excluded from the word count. If you need to use appendices, these should be used for supplementary non-essential text only; examiners will not mark these. The following headings may help in preparing your work: Research Question, Background, Study Design, PICO/PECO and Eligibility Criteria, Sampling and Sample Size, Data Collection, Data Analysis, Limitations (e.g. bias, confounding), Ethical and Data Management Issues. NOTE: this is a study design only; no “Results” or “Discussion” of anticipated results sections are needed. The data analysis will be the data analysis plan, explaining how you would analyse the data. You should explain your thinking more than you would if writing a study protocol normally, justifying your choice of study design and other decisions that you have made such as the effect size used in sample size calculations. Main issues to be addressed 1 The study must have a quantitative methodology using an established study design, or an innovative variation on standard designs. 2 Background to the research question can be described briefly but should include a general introduction to the subject area and focus on the specific background that would lead to you posing the research question and planning the study that your report proposes. 3 Research question(s) and / or any hypotheses must be stated clearly. 4 The type of study design proposed should be identified clearly and the rationale for its choice presented. This should demonstrate your understanding of study design purposes and limitations. You may choose to say why you did not choose a particular design as part of justifying your final choice. 5 Some consideration of population and study sample is essential: this should include patients or other participants, how the sample will be selected and the study setting. 6 A detailed description of methods could include whichever of the following are relevant: · Outcome(s) of interest · Exposure(s) or interventions · Statistical analyses - types of data to be collected, and your plan for data analysis, including the planned use of statistical testing (e.g. what tests you would use) · Randomisation if relevant · Bias and confounding and steps that can be taken to consider and minimise them - your report should discuss possible sources of bias and confounding in the study design and steps to be taken to minimise their effects. If your design avoids either or both of these problems that should be explained. · Practical aspects of the proposed research methods, including data collection · Ethical issues Topics for group work / assignment The effects of environment on mood or exercise Does where you live affect your mood, or how active you are? Design a study to describe the effect of your local environment on depression or on physical activity. Section 2: Statistics Guidelines for students In estimating statistics and conducting tests you may use appropriate computer applications, but include computer outputs that contain the values that you use to answer the questions. Graphics or figures, if you make any, should be produced using a computer. Where hand drawn graphics are to be submitted, these should be scanned and appended to your work. Answer BOTH of the following questions. Question A (25 marks) Data analysis exercise Data have been collected in a randomised controlled trial on the effectiveness of a new method for providing oxygen support for patients with severe community acquired pneumonia (CAP).  Patients over 65 years of age considered to be in need of oxygen support were randomised to either receive the standard or new approach.  In all other respects patients in both groups received standard care.  All patients were followed up until they were discharged from hospital or died.  The primary outcome was whether or not a patient’s condition deteriorated to the extent that they had to be admitted to an intensive care unit (ICU) within 28 days. A total of 410 patients were recruited to the trial, with 210 and 200 randomised to receive each of the new and standard interventions respectively.  Among patients receiving the standard intervention, there were 56 ICU admissions, while the number of ICU admissions among the patients receiving the new intervention was 45.  The outcome was observed for all patients and no patients died prior to being admitted to ICU. Additional historical data were available from 100 patients who received the standard intervention treatment when it was introduced ten years ago.  Of these patients, 33 were admitted to ICU within 28 days. (a) The main question of interest is whether the new intervention leads to a reduction in ICU admission rates relative to the standard intervention. (i) Explain, giving your reasoning, which of the data given above you would use to answer this question.  Express these data in the form. of a table, including totals and percentages that are helpful in interpretation of the data. (3 marks) (ii) Based on the data that you presented in your answer to (i) conduct an appropriate statistical test to determine whether there is a statistically significantly difference between the numbers of patients being admitted to ICU with the two interventions.  State what you conclude from the results of this test.  State a condition for the test to be a valid and explain whether or not you think this is reasonable in this case. (4 marks) (iii) Based on the results you presented in you answer to (i), obtain an estimate of a measure of the difference in proportions of patients admitted to ICU within 28 days together with a 95% confidence interval.  Explain the meaning of this confidence interval and how it relates to the test that you conducted in (ii). (6 marks) (b) A second question of interest is whether recovery rates for patients receiving the standard intervention have changed over the last ten years.  In an attempt to answer this question the researchers have compared data on total length of stay in hospital for patients receiving the standard treatment in the trial described above with that for the group of patients treated ten years ago. As the distribution of length of hospital stay is known to be skewed, the analysis used the (natural) logarithm of the length of stay in days for each patient. The researchers plotted the log-transformed values against the date of the start of treatment.  The plot is given in Figure 1.  They have also used SPSS to fit a linear regression model relating these with the date of the start of treatment converted to the number of days since 1 Jan 2012.  Part of the SPSS output from this analysis is given in Table 1. (i) Explain what model is being fitted in the analysis given in Table 1. (1 mark) (ii) State two assumptions required for the hypothesis test presented to be valid and explain, giving your reasons, whether or not you believe these are likely to be reasonable in this case. (4 marks) (iii) Give the meaning of the most important figures presented in Table 1.  Using these results, explain whether the data suggest that there has been a change in length of hospital stay over the ten year period. (4 marks) (iv) Give two reasons why the data shown in Figure 1 might not fully answer the research question of whether recovery rates for the standard treatment have changed over the last ten years. (3 marks)

$25.00 View

[SOLVED] LM Data Mining and Machine Learning 2024 Lab 2 Clustering and PCA Matlab

LM Data Mining and Machine Learning (2024) Lab 2 – Clustering and PCA Objectives The objective of this lab is to use the methods described in lectures to discover the structure of a particular data set. At the end of the lab your task is to write down an intuitive textual description of the data.  The techniques that you should apply are clustering and PCA. What you will need All of the files that you will need are in the zip archive lab2-2024.zip which is on the Canvas page. The Data The data is stored in a text file called lab2Data (contained in the lab2-2024.zip file).  The data consists of 1050 points in 6 dimensional space.  Each point appears as a row in the data file – have a look at the file to see its structure.  There is a ‘header’ at the top of the file that specifies the number of columns and rows. Part 1: Clustering Your first task is to use clustering to try to determine whether there are natural clusters in the data, and if there are, how many.  To do this you need to apply clustering to the data.  You need two C programs agglom.c and k-means.c.  Use the provided .exe files (or compile these two source C programs if needed). The  program  agglom.c is  an   implementation  of  the  agglomerative  clustering  algorithm described in lecture material.  You should apply this to the data set to obtain a set of K initial centroids  for  k-means  clustering  (see  the  lecture  notes  to  understand  how).     Then  use k-means.c to locally optimize the centroids.   As well as producing a locally optimized set of centroids,  k-means.c returns the distortion for that set of centroids relative to the data.   I recommend 15 iterations of k-means clustering. Usage of agglom program:      agglom dataFile centFile numCent Runs  agglomerative  clustering  on  the  data  in  dataFile  until  the  number  of  centroids  is numCent.  Writes the centroid coordinates to centFile. Usage of k-means program:   k-means dataFile centFile opFile numIter Runs  numIter iterations  of  k-means  clustering  on  the  data  in  dataFile starting  with  the centroids in centFile.  After each iteration writes distortion and new centroids to opFile. You should use agglom.c and k-means.c to plot a graph of distortion as a function of K, the number of clusters.  Plot distortion for values of K between 1 and 12.  To clarify: for K=1 to 12 •    Apply agglom.c to the data set to obtain K initial centroids •    Apply 15 iterations of k-means clustering.  A list of 15 numbers will appear on the screen. What are they?  For each K make a note of the final number. Plot a graph of these 12 final numbers against K.   Note that to analyse data structure, it might sometimes be useful to plot the distortion in a log-scale or to plot the ratio of the distortion for K clusters to the distortion for K+1 clusters. Conclusion to Part 1:  What does the graph tell you about the structure of the data? Part 2: Principle Components Analysis (PCA) To apply PCA to the data you will need to use MATLAB.  MATLAB will complain about the header at the start of the data file lab2Data.  Therefore I have created a version of this file without the header, called lab2Data-Matlab.  Use this file with MATLAB. The procedure for applying and interpreting PCA is described in lecture material.   In  brief, the stages are as follows: 1. Load the data into a matrix, X say, in MATLAB. 2. Compute the covariance matrix of the data. You can either do this by implementing the formula for covariance given in the lectures, or you can simply use the MATLAB cov function: >> C = cov(X) 3. Apply eigenvector/eigenvalue decomposition to the covariance matrix: >> [U,D] = eig(C) Conclusion to  Part  2:   Write  down  the  eigenvalues.    What  does  the  eigenvector/eigenvalue decomposition of the covariance matrix C  tell you about the structure of the data set? Explain how your Part 2 conclusion is likely to change if each sample in the dataset was modified by adding the value of 15 in the dimension 1 and value of 30 in the dimension 4 (e.g., considering original data sample was [1, 3, 5, 0, 2, 3], the new data sample would be [16, 3, 5, 30, 2, 3]).  

$25.00 View

[SOLVED] MATH2920 WEEK 10 MINI-PROJECTS 2024/25 Python

MATH2920 WEEK 10 MINI-PROJECTS, 2024/25 Introduction & Instructions • In this document are descriptions of three Python miniprojects. Choose one to complete. • Each of the titles has its own template on Minerva.  Download and use this template, changing the title xxyyzz to your userid. Do not change the given names of functions. • Each of the three titles will have their own submission link on Minerva.  Make sure you submit your solution to the correct link by 2pm on Friday 13th December. • Your solution should consist of one single Python file, which should generate any graphs, etc., when run. (Do not try to submit text or graphic files separately.) • Make sure that your file runs, and takes at most 10 seconds to finish.  (If you want to perform longer experiments, you can include this code and discussion of your findings within comments.) • You should include discussion and commentary as comments within your file (not within print statements). • Each miniproject consists of three parts.  Each part is worth one third of the total avail- able marks.  (This should give you guidance as to how much work is expected in Part 3.) • In each case, Part 3 is much more open-ended than the other two parts, and is your chance to demonstrate independent initiative and creativity. • Make sure that you explain what experiments you have done and what your findings are, as comments within your file.  Creativity, mathematical understanding, and clarity of communication will contribute to your mark as well as coding proficiency. • You may want to include functions that you have written earlier in the course.  If so, please include them within your .py file (that is, don’t import them from some other file, because we won’t have that file!).  Feel free to import standard libraries that we’ve used, such as math, time, etc.. • During your investigation in these projects, you might investigate the topics online. This is fine. You are advised not to use code written by other people, as this will count against you for individual creativity and initiative.  However, if you do take a section of code from a source online, it is essential that you reference this, by giving the web address in a comment in your file.  Failure  to  declare  code  written by other people will be considered plagiarism. Option 1:  Sparse Rulers An ordinary ruler is a straight piece of wood where distances 0, 1, 2 . . . , N are marked, for some N ≥ 1.  A sparse ruler  (or simply a ruler) is an ordinary ruler from which some of the numbers 1, . . . , N −1 may have been deleted.  The number of marks on the ruler is its order and the value N is its reach.  Here, we will represent a ruler as a Python list of strictly increasing integers starting with 0. For instance  [0,1,3,7] is a ruler of order 4 and reach 7. Part 1. (a) Write functions reach(myruler) and order(myruler) which take as input a list represent- ing a sparse ruler and return its reach and order, respectively. (b) Write a function isitaruler(mylist) which takes as input a Python list, and returns True if the list represents a ruler (that is to say:  every entry is an integer, the list starts with 0, and each subsequent entry is strictly greater than the previous one), and False otherwise. Hint:    Set a variable called oksofar  ==  True which you can set to False if you find a problem. First test whether the list’s initial entry equals 0.  Then loop through the remain- ing entries testing whether they are integers with each strictly greater than the previous. (c) Write a function sparsenkrulers(n,k) which takes as inputs positive integers n,k where n + 1 ≥ k ≥ 2 and returns a list of all sparse rulers of reach n and order k. Hint:   the itertools library has a function combinations.  After importing it, the com- mand combinations(mylist,r) will generate all sublists of length r from mylist. Start with listofrulers  =  [].  Note that each ruler must contain the entries 0 and n with k-2 others.  Using the technique above, set up a loop through all combinations of k-2 numbers from range(1,n). At each stage, create a ruler which contains the entries 0 and n along with the current combination, sorted into ascending order.  Append it to the listofrulers. After the loop has completed, return listofrulers. A sparse ruler of reach N is complete if it is possible to measure all distances between 1 and N by taking the differences between two marks.  For instance  [0,1,3] is complete because the pairs (0, 1), (1, 3), and (0, 3) yield distances of 1, 2, and 3 respectively.  (Note that the pair of marks do not need to be consecutive.) On the other hand,  [0,1,4] is not complete as there is no way to measure a distance of 2. (d) Write a function ismyrulercomplete(myruler) which takes as input a list representing a sparse ruler of reach N and returns True if it is complete and False otherwise. Hint:   Create a list of all differences between entries in the ruler.  The loop through the values 1, . . . ,N testing whether they are in the list of differences.  (Remember, the line of code k  in  mylist will return True or False.) Part  2. A  Golomb  ruler  is  a  sparse  ruler,  where  the  pairs  of  marks  all  measure different distances. For instance,  [0,1,4] is Golomb as the measurable distances are all distinct:  1,3,4. However  [0,1,2] is not Golomb as the pairs (0, 1) and (1, 2) measure the same distance. (a) Write a function ismyrulergolomb(myruler) which takes as input a list representing a sparse ruler of reach N and returns True if it is Golomb and False otherwise. Hint:   Loop through the ruler’s list of differences, and count how many times each occurs. (The code mylist. count(k) will return the number of times k appears in mylist.) (b) Using the functions you have written, write functions listofgolombrulers(n) which out- puts a list of all Golomb rulers of reach n, and listofcompleterulers(n) which outputs a list of all complete rulers of reach n. Run your functions on some different values of n and comment on your findings. The Erd˝os-Tur´an construction produces a sparse ruler of order m > 2 with marks at the points 2mk + (k2  mod m) for each k ∈ 0, . . . , m − 1. (c) Write a function ErdosTuran(m) which takes as input an integer m > 2 and outputs the corresponding ruler.  Run it on a variety of entries, and comment on the type of rulers that are produced by different inputs. Part 3. Continue your investigations into sparse, Golomb, and/or complete rulers. You could consider, for instance, the notion of an optimal Golomb ruler of order k, which is a Golomb ruler of minimal possible reach n.  (That is, that there is no Golomb ruler of order k and reach < n.) Finding these is notoriously difficult as k grows, so do not be too ambitious with the size of k. You could count rulers of different kinds, and plot suitable graphs, commenting on what these tell you. Option 2:  Pythagorean Triples & Euler Bricks Part 1. Definition. A Pythagorean triple (x,y, z) is a triple of positive integers where x2 + y2  = z2 . This can be thought of as describing an x × y rectangle with the property that the diagonal z is also of integer length. A Pythagorean triple (x,y, z) is primitive if x,y,z are coprime (i.e. there is no integer k > 1 which divides all of them) . (a) Write a Python function PrimPyth(n) which returns a list of primitive Pythagorean triples (x,y, z) where 0 < x < y < z < n. For example, PrimPyth(6) should return [(3,4,5)] Hint: In this project we represent triples as tuples  (x,y,z)  not as lists  [x,y,z].  The two data-types behave similarly in many ways.  Do not write a triply nested loop which searches through all triples (x,y, z) as this will take n3  steps!  Instead, use the fact (proved by Euclid) that every primitive Pythagorean triple arises as (m2  − n2 , 2mn,m2  + n2 ) where m,n are co- prime integers which are not both odd. (b) Run your function from part (a) with n = 10000 and plot a scattergraph of y against x. (c) Write a function Pyth(n) which returns a list of (not necessarily primitive) Pythagorean triples (x,y, z) where z < n. Hint: Use your function from part (a). (d) Run your function from part (c) with n = 10000 and plot a scattergraph of x against y. (This should be a new figure not overwriting the one from part (b)). (e) Comment on your the features of your two graphs, writing your answer as a comment. Part 2. An Euler Brick is an a × b × c cuboid where the dimensions a,b,c are positive integers, as are the diagonals of each face. The first, (44, 117, 240), was discovered in 1719 by Paul Halcke. (a) Write a function IsItEuler(triple1,triple2) which accepts as input two tuples repre- senting distinct Pythagorean triples and returns True if together they describe two faces of an Euler Brick, and False otherwise.  For example the inputs (44,117,125),(117,240,267) should return True, as should (117,240,267),(117,44,125), etc.. (b) Write a function Bricks(n) to return a list of Euler bricks (a,b,c) where 0 < a < b < c < n. Hint: Use your functions Pyth and IsItEuler. (c) How many Euler bricks are there with all dimensions  < 1000? How many of them are primitive? Write your answer as a comment, explaining what a “primitive” Euler brick is. Part 3. Continue your investigations into Pythagorean triples and/or Euler Bricks.  Possible topics include: • Connections between Fibonacci numbers and Pythagorean triples.  The number 5 is both a Pythagorean hypoteneuse and Fibonacci number. Do other numbers share these properties? • Pythagorean quadruples (x,y,z, w) where x2  + y2 + z2  = w2 . • Let x + y + z be the perimeter of the triple (x,y, z).  In 1900, Derrick Lehmer proved that if N(p) is the number of primitive Pythagorean triples of perimeter ≤ p, then p/N(p) → π2/log2 as p → ∞ . Option 3:  Approximations of π The number π, which can be defined as the ratio of the circumference of a circle to its diameter, is irrational (it cannot be expressed as a fraction, and its decimal representation does not end in a repeating pattern).  This mini-project explores various methods for approximating its value.  First we note that, in floating point arithmetic, the value of π  can be found using the pi constant from the module math: In[1]  import math In[2] print(math. pi) Out[3]:  3 . 141592653589793 Part 1. (a) The Gregory-Leibniz formula for π dates from the 1670s (but was apparently known to Indian mathematician Madhava of Sangamagrama around 1400), and states that Write a function pi gl(n) which takes as an input an integer n and returns an approxima- tion of π, by evaluating the Gregory-Leibniz formula with n terms.  (For n=5 your function should return 3.3396825396... . Don’t forget to multiply the sum of the series by 4.) (b) Another series expression for π,  less well-known and discovered by Nikalantha  (around 1500), is Write a function pi nik(n) which takes as an input an integer n and returns an approxi- mation of π, by evaluating this formula with n terms.  (For n=3 your function should return 3.1333333... .) (c) A third expression is due to Wallis (1656): Write a function pi wallis(n) which takes an integer n as input and returns an approxi- mation of π by evaluating the Wallis formlua with n brackets in the formula.  (For example, pi wallis(3) should return 2.9257142857... .) (d) For  the  Gregory-Leibniz  expression,  compute  the  absolute  value  of  the  error  (that  is, abs(pi gl(n)-math. pi)) for n  =  1 to 500.  Store these errors in a list (say, pi gl error), and repeat this for the other two series,  storing the errors in new lists.   Plot  a  graph with the errors pi gl error, pi nik error pi wallis error on the y-axis, against the range(1,500) on the x-axis.  (You should plot all three lines on the same graph.  Be sure to label axes and data. You should choose for yourself whether plot, loglog or semilogy gives the most meaningful plot.) (e) Give a brief comment on the rate at which each expression converges to π . Part 2. Euler gave two remarkable expressions for π, which involve prime numbers, and in the second case, also prime factorisations of composites. (a) The following is an infinite product expression: In this formula, the numerators are the primes > 2, while each denominator is the multiple of 4 closest to the numerator. Write a function pi euler1(n) which computes the value of the first n terms in the product.  For example, pi euler1(3) should return 3.28125. (b) The next formula is an infinite sum: In this formula, each fraction 1/n has a sign (±) determined by: the first two terms have positive signs; after that, if the denominator is a prime of the form 4m − 1 (for example, n = 3, 7, 11, . . .), the sign is positive; if the denominator is a prime of the form 4m + 1 (for example, n = 5, 13, 17, . . .), the sign is negative; if the denominator is a composite number, then the sign is equal to the product of the signs corresponding to its factors (for example, n = 9 = 3 × 3, so its sign is positive (a positive times a positive), while n = 10 = 5 × 2, so its sign is negative (a negative times a positive)). Write a function pi euler2(n) which computes the value of the first n terms of this expression.   For example, pi euler2(2) should return 1.5. (c) Compute the errors for both Euler expressions, as in part 1, again for n=1 to 500.  Plot a graph showing the errors for both Euler expressions, together with the errors for the previous three approximations from part 1, on the same graph.  Comment briefly on the convergence of the Euler methods. (d) Recall the Monte Carlo method, from week 6 (section 6.2.2), for approximating π .  Suppose we choose a point  (x,y) randomly  (with uniform distribution) in the unit square.   The probability that it lies inside a circle of diameter 1 contained in the unit square is equal to the area of that circle, or π/4.  So this Monte Carlo method works as follows: (i) Generate a large number M of points (x,y) with both x and y uniformly distributed random variables in [0, 1]. You can use the module random to do this, as this module contains a function (also called random) which returns a number from the uniform distribution on [0, 1]. (ii) For each (x,y) produced, check whether it lies inside a circle of diameter 1 centred at (0.5, 0.5). Let the number of such points be Min.    (iii) Then an approximation for π is given by 4Min /M. Write a function montecarlo(M) which takes an integer M and returns an approximation to π .  (I can’t give you an example output, as the random nature of the procedure means approximations will differ!) (e) Write a function montecarlo accuracy(eps) which takes as an input some small float eps, and repeatedly performs the Monte Carlo procedure, until the approximation has an error 

$25.00 View