This assignment will assess your skills and knowledge to create interactive GUI applications that employ event handling mechanisms. Requirements: GUI Design: Design an intuitive and user-friendly GUI interface for the Student Management System. Implement the GUI using Java’s GUI frameworks such as Swing or JavaFX. Include appropriate components such as labels, text fields, buttons, tables, and menus to display and interact with student records, course enrollment, and grades. Ensure that the GUI is aesthetically pleasing, easy to navigate, and logically organized. Student Management Functionality: Provide functionality to add new students, update student information, and view student details through the GUI interface. Implement event handlers for relevant GUI components, such as buttons or menu items, to perform the corresponding actions. When the “Add Student” button/menu item is clicked, display a form to enter the student’s information and add the new student to the system. When the “Update Student” button/menu item is clicked, display a form to select a student and update their information. When the “View Student Details” button/menu item is clicked, display a table or another suitable component to show a list of students and their details. Course Enrollment Functionality: Include functionality to enroll students in courses through the GUI interface. Implement event handlers to respond to actions such as selecting a course and enrolling a student. When a course is selected from a dropdown menu or list, display a list of students eligible for enrollment. Allow administrators to select a student from the list and enroll them in the chosen course. Grade Management Functionality: Incorporate functionality to assign grades to students through the GUI interface. Implement event handlers to respond to actions such as selecting a student, selecting a course, and assigning a grade. When a student is selected from a dropdown menu or list, display a list of courses they are enrolled in and their current grades. Allow administrators to select a course and assign a grade to the selected student. Dynamic Interface Updates: Ensure that the GUI interface updates dynamically to reflect changes in student records, course enrollment, and grades. When a new student is added or information is updated, update the student list or details display accordingly. Error Handling: Implement appropriate error handling mechanisms in the GUI application. Display error messages or dialog boxes when invalid inputs are provided or when operations cannot be completed. Handle exceptions gracefully to ensure the application remains responsive and user-friendly. Documentation: Provide comprehensive documentation for the project, explaining the purpose and usage of each GUI component, event handler, and functionality. Describe the design choices made for the GUI interface and the rationale behind them. Include instructions for running the program and interacting with the GUI interface.
In this assignment you are going to create your bucket list but in a different way than you might be used it. It’s forbidden in this assignment to add elements, content, styling or anything else, inside your HTML document or CSS files. All of this will be handled through vanilla JavaScript. The purpose of the assignment is to learn how we can create basic, and more advanced content, solely with JavaScript.To get you started you are given a basic template of HTML and CSS code, but this is all you get. The rest you must create on your own. Follow the instructions below. 1. Create a using the createElement method. When you have a reference to your new element, change its innerText to something that you would like to have on your bucket list, and then append it to the DOM with the appendChild method. Where does this element go? How can you get it do be added directly after the already existing . 2. Create another with an item you would like to have on you bucket list. This time add it to the DOM, right after your existing -tags with the help of the insertAdjecentElement method. This method accepts a position argument. Which value should that argument have? 3. afterbegin 4. afterend 5. beforebegin 6. beforeend Try them out! 1. innerHTML is an interesting property that exists on HTML elements. With that property we can get and set the inner HTML of a HTML element very easy. Try to get (or do you already have it?) the inner HTML from the element that contains all the -tags of your bucket list. Log that to the console. 2. In order to set the innerHTML of an element we need to create a string that contains the HTML code that we want to add to the DOM. It can look something like this: js “This is a div element as a string”; Now create that string that contains a new item that you want to add to your bucket list. 1. Set the innerHTML of the list with the new item you just created. What happens when you do that? 2. Comment out that previous line(s) of code and the three items that you had before you should exist again. How can you add that last item and still keep the three other ones? There is a method that is very similar to the insertAdjecentElement that will take your HTML string and add it to the list. Try to add your new item to the beginning of the list. 3. Add three more items to the end of list, but try and do it with a loop instead. Less repetitive code. 4. How many items do you have in your bucket list now? Log it to the console. Use the children property. 5. Change the content of the h2 at the to top of the HTML document do have your name instead of “Bucky’s”; 6. Replace the first item in your list with a new item. There are several ways to do this, but try the replaceChild method out. 7. Now try replace an element in the middle of the list to a new one. Use the same method as before or get creative. 8. Remove the last element in the list. lastChildElement combine with the removeChild method might work.
Important notes: Question 1 is longer and more complex than question 2. Please KNIT as you go along and CHECK YOUR OUTPUT to ensure there are no errors! Question 1 This question involves cleaning and exploring a dataset from the United States Department of Agriculture (USDA) Foreign Agriculture Service (FAS) Production, Supply, and Distribution (PS&D) database, source.Part 1A Create a new data frame called psd_tidy by performing the following operations IN THE ORDER SPECIFIED!. If you do the operations out of order you WILL get errors! Write all code in the chunk provided below. Check after each step that your data frame matches what is expected by using dim() to get the dimensions. NOTE: I HIGHLY recommend reading the instructions below in the knitted HTML file instead of in this Rmd file, they’re formatted to be MUCH easier to read in HTML format! 1. First, select just the Commodity_Description , Country_Code , Country_Name , Market_Year , Attribute_Description , and Value . After step 1, your data frame should have 2,001,791 rows and 6 columns. 2. Next, pivot the data frame to a wider format so that variable names from Attribute_Description are spread out over multiple columns and values from Value are used to fill in the data frame. We highly recommend adding the argument names_repair=”universal” to your pivot function, which will automatically repair column names by replacing invalid characters with periods After step 2, your data frame should have 155,338 rows and 75 columns 3. After step 2, you should have a lot more columns with specific commodity details like production, imports/exports, etc. Select just the columns Commodity_Description , Country_Name , Market_Year , Production , Imports , Exports and rename these as “commodity”, “country”, “year”, “production”, “import”, “export”; also keep Total.Distribution and Domestic.Consumption , rename these to be lowercase. In addition, please also sort rows by commodity, then country, then year. After step 3, your data frame should have 155,338 rows and 8 columns 4. Now, remove rows where: The country is in a list of countries which do not have recent data for a variety of reasons (see chunk below for details) The commodity is either “Cotton”, “Millet”, or “Mixed Grain” as these are reported inconsistently. After step 3, your data frame should have 123,186 rows and 8 columns ## This chunk contains the two lists of countries to remove (feel free to use this code) ## countries that no longer exist OR started reporting data together with their parent country: old_countries = c(“Antigua and Barbuda”, “EU-15”, “EU-25”, “Former Czechoslovakia”, “Former Yugoslavia”, “Fr.Ter.Africa-Issas”, “French Polynesia”, “Gaza Strip”, “German Democratic Republic”, “Germany, Federal Republic of”, “Gibraltar”, “Gilbert and Ellice Islands”, “Greenland”, “Guadeloupe”, “Martinique”, “Puerto Rico”, “Serbia and Montenegro”, “St. Lucia”, “Union of Soviet Socialist Repu”, “Virgin Islands of the U.S .”, “Yemen (Aden)”, “Yemen (Sanaa)”, “Yugoslavia (>05/92)”) ## countries that stopped reporting individual data and instead report as part of “European Union”: eu_countries = c(“Austria”,”Belgium”,”Bulgaria”,”Croatia”,”Cyprus”,”Czechia”,”Denmark”,”Estonia”,”Finland “,”France”,”Germany”,”Greece”,”Hungary”,”Ireland”,”Italy”,”Latvia”,”Lithuania”,”Luxembourg”,”Malta”,”Neth erlands”,”Poland”,”Portugal”,”Romania”,”Slovakia”,”Slovenia”,”Spain”,”Sweden”) ## if you want to use a filtering join, you can use this as one of the input data frames: countries_to_remove = tibble( country = c(old_countries, eu_countries) ) NOTE: For the rest of question 1, unless otherwise specified, always start with psd_tidy as your initial tidy data frame. Do NOT overwrite psd_tidy with any operations in later parts, since we will reuse it. # Insert Code Here Part 1B Let’s explore a few columns of our psd_tidy data frame. Perform the following operations in order. Remember you should NOT overwrite psd_tidy , so please save your output to a new data frame called psd_summary . 2. For each commodity and country, summarize to obtain the total sum in each of the total.distribution, production, import, and export columns. Note: use na.rm=TRUE inside sum() to ignore any NAs After step 2, your summary data frame should have 2567 rows, one for each commodity-country combination. 3. Regroup by just commodity. Add a new column called global.total that has, on each row within a commodity, the same value, representing the sum of the total.distribution sum you calculated in step 2. 4. Calculate the percentage that each country’s production, import, and export (the sums in step 2) represent out of the global.total of each commodity (that you just calculated in step 3). For each question, you should print a table with 1 row for each commodity showing the country name and the percentage. Please print the ENTIRE data frame! Note there are only 60 commodities, so you should only need to print 60 rows, use something like2. For each commodity, which country is the largest global importer by percentage over the last 10 years?3. For each commodity, which country is the largest global exporter by percentage over the last 10 years?Part 1C Again, starting with psd_tidy from part 1A, let’s focus on a few popular commodities and visualize trends in their production over the years. Create a new data frame for each question and DO NOT modify psd_tidy itself. 1. Corn is one of the most popular commodities. Make a plot of corn production over time.2. Repeat the above steps for Wheat, another of the most popular commodities. Note you should be able to just copy the entire code above and just change Corn to Wheat everywhere.3. Several of the commodities include the words “Meat”, “Dairy”, or “Fresh” (these are fresh produce). For example, “Meat, Chicken” and “Poultry, Meat, Broiler” are two of the several meat commodities and “Dairy, Butter” and “Dairy, Cheese” are two of several dairy commodities. Step 1: – Use mutate(category = str_extract(commodity,”Meat|Dairy|Fresh”)) to create a new variable called category which contains either the word “Meat”, “Dairy”, “Fresh”, or NA. – Drop the NA rows in category Step 2: – Re-summarize the data to obtain total production within each country-category combination. Step 3: – Calculate each country’s total production across all three categories, and add this as a new column, with the same value in every row within a country. Step 4: Make a bar plot comparing the levels of meat, dairy, and fresh produce production for these 10 countries; put country on the y axis, total production on the x axis, and change the interior color of the bars by category .Question 2 More info about the dataset can be found in the help page by running ?storms data(“storms”) print(storms) ## # A tibble: 19,537 × 13 ## name year month day hour lat long status category wind pressure ## ## 1 Amy 1975 6 27 0 27.5 -79 tropical d… NA 25 1013 ## 2 Amy 1975 6 27 6 28.5 -79 tropical d… NA 25 1013 ## 3 Amy 1975 6 27 12 29.5 -79 tropical d… NA 25 1013 ## 4 Amy 1975 6 27 18 30.5 -79 tropical d… NA 25 1013 ## 5 Amy 1975 6 28 0 31.5 -78.8 tropical d… NA 25 1012 ## 6 Amy 1975 6 28 6 32.4 -78.7 tropical d… NA 25 1012 ## 7 Amy 1975 6 28 12 33.3 -78 tropical d… NA 25 1011 ## 8 Amy 1975 6 28 18 34 -77 tropical d… NA 30 1006 ## 9 Amy 1975 6 29 0 34.4 -75.8 tropical s… NA 35 1004 ## 10 Amy 1975 6 29 6 34 -74.8 tropical s… NA 40 1002 ## # ℹ 19,527 more rows ## # ℹ 2 more variables: tropicalstorm_force_diameter , ## # hurricane_force_diameter Part 2A Note the data frame has many rows per storm. The purpose of this problem is to make a data summary storms_summary with one row per storm with the following variables: year : year of each storm name : name of each storm note there are many duplicate names in the dataset using both year and name (almost) uniquely identifies each storm with one exception (Zeta from Dec 2005 to Jan 2006, which we just ignore for now) min_pressure : MINIMUM air pressure (note pressure decreases as storm intensity increases) Your result should have 639 or 655 rows; some computers use an older version of “storms” hence the difference. Print at least the first 6 rows! # Write your code to overwrite storms_summary HERE; you can even continue with a pipe from the previous l ine if you want # Don’t change this line, and it must be after your code above storms_summary[storms_summary==-Inf]=NA # Print the first 6 rows here print(storms_summary, n = 6) ## # A tibble: 19,537 × 14 ## name year month day hour lat long status category wind pressure ## ## 1 Amy 1975 6 27 0 27.5 -79 tropical de… NA 25 1013 ## 2 Amy 1975 6 27 6 28.5 -79 tropical de… NA 25 1013 ## 3 Amy 1975 6 27 12 29.5 -79 tropical de… NA 25 1013 ## 4 Amy 1975 6 27 18 30.5 -79 tropical de… NA 25 1013 ## 5 Amy 1975 6 28 0 31.5 -78.8 tropical de… NA 25 1012 ## 6 Amy 1975 6 28 6 32.4 -78.7 tropical de… NA 25 1012 ## # ℹ 19,531 more rows Part 2B Finally we’ll do a bit more visualizing and summarizing using storms_summary which you just created.
You have three hours to complete the exam from the time that you begin the Canvas quiz which gives you access to the exam documents. Piazza will be disabled during the exam and for several days after. Set-up Inside your course directory STAT240 , add these additional directories. STAT240/exams/ STAT240/exams/midterm_1 Download the following data set from the course repository and put them into your STAT240/data/ directory. This file and other exam files should be placed in STAT240/exams/midterm_1 . Exam datasets The following chunk reads in the data sets used in the exam.These lines of code will only work if you have your file directories set up correctly according to the instructions above. If you get a “file does not exist error”, then most likely one of the following has happened. There is a mismatch between the file path inside read_csv and the file structure on your personal computer. The files are set up correctly, but the .csv file is not in the correct location on your computer. You are strongly encouraged to perform a thorough exploration of the data sets ahead of time! Data Set 1: Volleyball You do not need to know anything about volleyball to be able to answer the questions. There are 332 such teams and their identities are recorded with a number in Team_Index , their name in Team , and a conference affiliation in Conference . Other variables contain various summary statistics and will be defined if needed in certain questions. Data Set 2: States The states data set has a single row for each of the 50 states in the USA. There are multiple other variables, some of which are categorical and some of which are numerical. Variables will be defined as necessary in exam questions which use them. Data Set 3: Cities The cities data set has a single row for each state, district, or federal territory in the USA and columns with the name and population for the largest city within each of these entities. Data exploration You are strongly encouraged to explore both data sets prior to beginning the exam. Optionally, for each of the three datasets, write down or make a mental note of the following. How many rows does the data have? What are the columns names and column types? Use glimpse . Are any of the columns within the dataset directly related to each other? What are some interesting research questions we could answer with this dataset? Exam When you open the exam assignment on Canvas, your 3-hour timer will start and you will be able to download an .Rmd file containing the exam questions. The format is very similar to homework assignments. You’ll write your answers and code within the same .Rmd and knit it to an .html. Knit the document frequently as you work on it. Make sure to turn in the R Markdown file and the knitted HTML file before your quiz time runs out.
Preliminaries This file should be in STAT240/homework/hw11 on your local computer. Download parental_leave.csv to STAT240/data . Problem 1c. Comment on why a T test for the mean might still be an appropriate method, even though the total paid leave data does not look normal. Replace this text with your response! Problem 2Problem 3 Problem 4 Re-calculate the p-value from problem 3, but use a standard normal distribution instead of a (T_{n-1}) distribution. Why are the results very similar?Problem 5Problem 6 Does the educational industry give more paid leave than the healthcare industry? Perform a hypothesis test of [H_0: mu_{ESCU} mu_{HHC} = 0 quad ext{versus}quad H_A: mu_{ESCU} – mu_{HHC} > 0] with (alpha = 0.05). Do this by hand, then check yourProblem 7 Repeat the test in problem 6, but instead write hypotheses for (mu_{HHC} – mu_{ESCU}) (the order of subtraction is switched). Show that you get identical results to problem 5.Problem 8 Now, consider the “Technology: Software” industry. Perform the appropriate hypothesis test to determine whether there is a difference in paid maternity and paternity leave within this industry. Note that each company decides its own policies on maternity and paternity leave, which should NOT be considered independent.
Preliminaries This file should be in STAT240/homework/hw10 on your local computer. Download superbowl_commercials.csv to STAT240/data . Disclaimer Many of these problems ask you to consider some true proportion. We have data on every commercial these brands have aired in Super Bowls in this time period, so the notion of a ‘population’ or ‘true parameter’ here is about a theoretical set of all advertisements that could have been created for Super Bowls, or if you prefer, about the probabilistic decision-making process that companies have to make behind the scenes (like the chimpanzee prosocial choices). Statistical inference often requires us to consider unintuitive, theoretical populations like this. Understanding of this point is not required for doing the computations we request below, but is important to always keep in mind when conducting inference. Problem 1 Each advertisement is classified according to several characterisitcs. More information can be found here. Transform the data as follows:Problem 2 Count the total number of ads in the dataset, as well as the total number of “Funny” ads. Build and interpret a 99% CI for (p_{funny}), the overall proportion of funny ads across all of the brands. Use the Agresti-Coull adjustment.Problem 3 Repeat the analysis in problem 2, but build a 99% CI fo for (p_{funny}) with the Wald adjustment. How do the two intervals compare?Problem 4 Perform a hypothesis test to determine whether more than half of superbowl ads are funny. Use hypotheses [H_0: p_{funny} = 0.5 quad ext{versus}quad H_A: p_{funny} > 0.5] and (alpha = 0.01). Interpret your result in context. Please PRINT OUT YOUR p-value. Replace this text with your response. Problem 5 Replace this text with your response. (Model and assumptions) Replace this text with your response. (Criticism) Problem 6 Now, focus on (p_{H, funny}), which is the proportion of Hyundai ads that are funny. Perform a two-sided hypothesis test to determine whether or not half of all Hyundai ads are funny. Please EXPLICITLY STATE your hypotheses, test statistic, null distribution, p-value, and conclusion.Problem 7 Consider comparing the proportion of Hyundai ads that are funny to the proportion of Budweiser ads that are funny. Build a 95% confidence interval on the difference in proportions and use the Agresti-Coffe adjustment. Interpret your results in context.Problem 8 Perform a hypothesis test to determine whether a different proportion of Hyndai ads than Budweiser ads are funny. Write appropriate twosided hypotheses and draw a conclusion with (alpha = 0.05). Please EXPLICITLY STATE your hypotheses, p-value, and conclusion. No need to state the test statistic or null distribution.
Preliminaries This file should be in STAT240/homework/hw09 on your local computer. Download happiness_2019.csv and dc_weather.csv to STAT240/data . Problem 1 A one-sided hypothesis test of [H_A: eta_1 > 0] is performed on a linear model. The p-value of this test is 0.035. What would be the p-value if the same data was used to test the following alternatives? (a) [H_A: eta_1 < 0] Replace this text with your response. (b) [H_A: eta_1 eq 0] Replace this text with your response. Problem 2 Continue working with the happiness index vs GDP of countries model from Homework 8.Build and interpret a 98% CI for the true slope of the linear relationship between happiness index and GDP. Does the interval cover 0?Problem 3 Perform a hypothesis test of hypotheses [H_0: eta_1 = 0 quad ext{versus}quad H_A: eta_1 eq 0] for the slope of the happiness model. What is the test statistic, p-value, and conclusion at the 2% level?Problem 4 How do the results of the hypothesis test in problem 3 relate to the results of the confidence interval in problem 2? Replace this text with your response. Problem 5 Which of the following conditions lead to a smaller standard error? Briefly explain your choices. A smaller sample size vs a larger sample size Replace this text with your response. A smaller value of (sigma) vs a larger value of (sigma) Replace this text with your response. Predicting closer to (ar{x}) vs predicting further from (ar{x}) Replace this text with your response. A smaller variance vs a larger variance in the original (X) data Replace this text with your response. Problem 6 Consider predicting the happiness index of a country with 1 GDP per capita. (a) Build a prediction interval for the happiness index of a new country with (x^* = 1) GDP per capita.(b) Build a confidence interval for the height of the regression line at (x^* = 1).Problem 7(a) Make a plot of minimum temperature (in C) on the x axis versus dew point on the y axis. Comment on the shape of the data.Problem 8 We want to test whether the slope of the linear relationship between minimum temperature and dew point is greater than 1. Write hypotheses corresponding to the question of interest and carry out the test on the weather data. Make a conclusion with (alpha = 0.05).
Preliminaries This file should be in STAT240/homework/hw08 on your local computer. Download happiness_2019.csv to STAT240/data . Problem 1 You wish to show that your roommate is not home. You check the main areas of your apartment (skipping things like closets or storage rooms) and do not find them. (a) Thinking of this as an inference problem, what are the appropriate null and alternative hypotheses? Replace this text with your response. (b) Thinking of this as an inference problem, which of the following is the appropriate conclusion? A. The probability that we do not find them in the main areas, given that they are NOT home, is 100%. This is not a low p-value, so we fail to disprove that our roommate is home. B. If they are not home, there is a 100% chance we do not find them. We did not find them, so they are not home. C. The probability that we do not find them in the main areas, given that they ARE home, is very low. Since we did not find them, that means they are probably not home. D. If they ARE home, the probability of finding them in the main areas is high. This is not a low p-value, so we fail to disprove that our roommate is home. Replace this text with your response. Problem 2 We wish to test whether the average person is shorter than the average NBA player. We have data for the whole population of NBA players and know their average height to be 6’6” (78 inches). Let the average height of a non-player be (mu). (a) What are the null and alternative hypotheses for this test? Replace this text with your response. (b) The Central Limit Theorem says the sample mean of n individuals is approximated by (NBig(mu, ; rac{sigma}{sqrt{n}}Big)), where (mu) and (sigma) are the expectation and standard deviation of the population. Assume (sigma), the standard deviation of all non-player heights, is 4 inches. If the null hypothesis was true, what is the distribution of the sample mean of the heights of 5 nonplayers? Replace this text with your response. Problem 3 We wish to test whether the average person is shorter than the average NBA player. We have data for the whole population of NBA players and know their average height to be 6’6” (78 inches). Let the average height of a non-player be (mu). (a) We sample 5 non-players and get the heights (70, 61, 63, 65, 72). What is the sample mean? What is the probability of getting asample mean smaller than this on the distribution from problem 2?(b) What do we conclude about the average height of non-players compared to the average height of an NBA player? Are we completely100% certain of that conclusion? Replace this text with your response. EXAM 2 CONTENT STOPS HERE. Any content below will not be on exam 2. Problem 4 Match the six values of correlation to the scatterplots in p4_choices.png . Briefly justify your choices. (r = -0.85) Replace this text with your response. (r = -0.74) Replace this text with your response. (r = 0.08) Replace this text with your response. (r = 0.44) Replace this text with your response. (r = 0.98) Replace this text with your response. (r = 1) Replace this text with your response.Problem 5(a) Create a scatterplot of GDP per capita (x) versus happiness index (y) and calculate the correlation between length and weight.Comment on the strength and magnitude of the linear relationship, and whether a linear model seems to be appropriate.(b) Calculate the slope and intercept of a least-squares linear regression model for GDP per capita versus happiness index. Do this “byhand” and check your work with lm . Interpret the coefficients in context. Problem 6 Perform a residual analysis of the happiness linear model to assess the fit. Build a scatterplot of residuals and comment on the three assumptions: Linearity NormalityProblem 7 Using your linear model, predict the happiness index for theoretical countries with 0.75, 1.25, and 2 GDP per capita. Which of these predictions would you consider to be the least reliable, and why?Problem 8 What are the conclusions we can make about the relationship between GDP per capita and happiness index, based on the linear model? Does money buy happiness? Replace this text with your response.
Preliminaries This file should be in STAT240/homework/hw07 on your local computer. Problem 1 The weight of adult male grizzly bears in the continental United States is well approximated by a normal distribution with mean 510 pounds and standard deviation 45 pounds. The weight of adult female grizzly bears in the continental United States is well approximated by a normal distribution with mean 315 pounds and standard deviation 37 pounds. Suppose a male grizzly bear that is 441 pounds is observed. What would be the approximate weight of a female grizzly bear with the same weight percentile as this male grizzly bear?Problem 2 The lifespan of a certain type of tire is approximately normal, with mean 39 and standard deviation 2.5 (units are in thousands of miles). (a) What is the probability that a random tire will last longer than 40,000 miles?(b) What is the 95th percentile of a single tire’s lifespan?(c) Take a random sample of 4 tires. What is the probability that the average lifespan of the four tires will be greater than 40,000 miles?(d) What is the 95th percentile for the average lifespan of four tires?Problem 3 Suppose you are playing a coin flipping game with a friend, where you suspect the coin your friend provided is not a fair coin. In fact, you think the probability the coin lands heads is less than 0.5. To test this, you flip the coin 100 times and observe the coin lands heads 30 times. If you assume the coin is fair (i.e., the probability of the coin landing heads is 0.5), what is the probability of observing 30 heads or fewer?Calculate the previous probability, but use a normal approximation to achieve a numerical value. What is the error in this approximation?Problem 4Problem 5 Continue the analysis of the right-skewed population from problem 4. Now, perform the sampling process three times, each with (n = 50). Do this with 50 iterations, 500 iterations, and 50000 iterations. Then, make three density plots to compare the three sampling distributions. What do you notice as the number of iterations increases?Problem 6 For each of the three prompts, explain which condition will result in a narrower confidence interval. If the confidence level would not change, explain why. Having a larger confidence level (smaller alpha) versus having a smaller confidence level (larger alpha) Replace this text with your response. Having larger estimation error (a.k.a “sampling error”) versus having smaller estimation error Replace this text with your response. Having a larger point estimate versus having a smaller point estimate Replace this text with your response. Problem 7 The code below calculates the Z critical value to build a 95% confidence interval. Edit the code to instead calculate the critical value for 98% confidence.Next, generalize this operation to any confidence level. Uncomment and edit the code below to return the critical value for a (1-alpha) confidence interval for any choice of alpha.Problem 8 For Halloween, you have purchased a big bag of candy made up of equal amounts of KitKats, Hershey’s, Almond Joys, and Reese’s candy bars. (a) A small child reaches in and pulls out three Reese’s candy bars. What is the probability that all three of their three candies were Reese’s by random chance? (Assume candy bar draws are independent and all have equal probability of being a Reese’s.)(b) A different small child reaches in and takes seven Reese’s candy bars and one other candy bar. What is the probability that seven ormore of their eight candies were Reese’s by random chance?
Preliminaries This file should be in STAT240/homework/hw06 on your local computer. Problem 1 For each of the following questions, say whether the random variable is reasonably approximated by a binomial random variable or not, and explain your answer. If the variable is binomial, identify (n) the number of trials and (p) the probability of success. If it is not binomial, identify which of the “BINS” assumptions is violated. a. A fair die is rolled until a 1 appears, and (X) denotes the number of rolls. Replace this text with your response. b. Twenty of the different Badger basketball players each attempt 1 free throw and (X) is the total number of successful attempts. Replace this text with your response. c. A die is rolled 50 times. Let (X) be the face that lands up most often. Replace this text with your response. d. In a bag of 10 batteries, I know 2 are old. Let (X) be the number of old batteries I choose when taking a sample of 4 to put into my calculator. Replace this text with your response. e. It is reported that 20% of Madison homeowners have installed a home security system. Let (X) be the number of homes without home security systems installed in a random sample of 100 houses in the Madison city limits. Replace this text with your response. Problem 2 Create a data frame with the following columns. Each row corresponds to a single ( ext{Binom}(n, p)) distribution. The first two columns are the parameters of the distribution.Replace this text with your response. Problem 3 The random variable (X) has the ( ext{Binom}(100, 0.2)) distribution. Find an integer (a) so that (P(X le a) ge 0.5) and (P(X ge a) ge 0.5). Show the values of (a), (P(X le a)), and (P(X ge a)). Problem 4 A student decided to guess randomly on their True/False quiz. The number of questions they answer correctly is ( ext{Binom}(10, 0.5)). Write code with dbinom , pbinom , or qbinom to calculate that value or probability. Problem 5 Match the four binomial distributions given below to the appropriate graph in p5_choices.png . Two of the distributions will not be used. Briefly justify your choices. Binom(12, 0.5) Replace this text with your response. Binom(12, 0.6) Replace this text with your response. Binom(10, 0.1) Replace this text with your response. Binom(10, 0.3) Replace this text with your response.Problem 6 Are the following statements true for Binomial distributions, Normal distributions, or both? This distribution is always symmetric. Replace this text with your response. If you know the two parameters of this distribution, you can calculate its mean, any probability, or any quantile. Replace this text with your response. If (mu) is the mean of the distribution, then the probability distribution graphically reaches its maximum at (mu). Replace this text with your response. If (mu) is the mean of the distribution, then it is possible for the probability of getting exactly (mu) on a random draw to be 0. Replace this text with your response. Problem 7 Use pnorm to find the probabilities highlighted below on a N(0, 1) curve.Problem 8 Let (X_1) and (X_2) be two draws from (X ~ N(10, 4)). Order the five events below based on which events are least to most likely to occur. Event A: (X_1) > 15 Event B: (X_1) = 15 Event C: (X_1) < 15 Event D: (X_1) > 15 AND (X_2) > 15 Event E: (X_1) > (X_2) Replace this text with your response.
Preliminaries This file should be in STAT240/homework/hw05 on your local computer. Download Cameron_Lectures_SP24.csv to STAT240/data . Download ggprob.R to STAT240/scripts if you have not already done so. Problem 1 Apgar scores are a test of newborn babies’ health immediately after being born, on a scale from 0 to 10. Most babies score 7 or higher. The distribution of Apgar scores is shown below, with the probability of a baby scoring a perfect 10 removed.Problem 2b) What is the variance of this distribution?c) What is the standard deviation of this distribution?Problem 3 For each of the real life scenarios described below, state whether you would treat the random variable as discrete or continuous and briefly explain why. Replace this text with your response Replace this text with your response Problem 4Problem 5 Our goal in this problem is to calculate the percentages associated with each of the five bars in the above histogram. First, create a new variable called base_minutes by “truncating” total_minutes by using the floor() function. This is effectively just cutting the decimal off, which is different than rounding; 49.4, 49.9, 50.1, 50.8 will become 49, 49, 50, 50. Then, create a dataframe with one row for each different value of base_minutes , and a column n for how many times that value of base_minutes occurs. Finally, create a percent variable by dividing n by the total number of lectures. Let this dataframe print as output, no need to save it in a new variable name.Problem 6 Replace this text with your response Replace this text with your response Problem 7Does our original model still seem reasonable? Briefly explain your reasoning. Note: This question is about the ability to construct an argument, either yes or no could be justified! Replace this text with your response Problem 8 a)You should find that Cameron wore a navy colored shirt in 36.8% of SP24 lectures. b) Is it valid to guess that approximately 36.8% of all Cameron’s shirts he owns are navy-colored? Briefly explain your reasoning. Replace this text with your response
Preliminaries This file should be in STAT240/homework/hw04 on your local computer. You should also download education.csv and obesity.csv to STAT240/data on your local computer. Problem 1The following chunk is provided for you.What do each of these four quantities represent in real terms? Do not use technical vocabulary. For example, if it was correct, you could write that “w is the percentage of zip codes in which at least half of female adults have a bachelor’s degree.” w is… REPLACE THIS TEXT WITH YOUR RESPONSE x is… REPLACE THIS TEXT WITH YOUR RESPONSE y is… REPLACE THIS TEXT WITH YOUR RESPONSE z is… REPLACE THIS TEXT WITH YOUR RESPONSE Problem 2 Using education_original , create a scatter plot with pct_f_bach on the x axis and pct_m_bach on the y axis. There will be one point for each zip code.Problem 3 Our goal in this problem is to create a new dataframe called education_long . This dataframe will have the columns:The steps to generate this new dataframe are: Start with education_original , Drop any rows that have any missing data anywhere. Pivot the dataframe such that you have TWO rows per zip code; one for each sex within zip. (Hint: We have one row per zip code in education_original , so the dataset is getting longer ) Make the column names after pivoting: zip , sex , and pct_bachelors . (This can be done either with a rename() , or with arguments within pivot_longer() .) Make the values of sex “female” and “male” rather than “pct_f_bach” and “pct_m_bach” using case_when() . (Alternatively, a slightly easier way could be to rename() the columns to female and male before pivoting; either way is acceptable.) # Write your code here Problem 4 Right now, the obesity data has one row for every zip-sex-age group combination; age groups are 5 years old – 17 years old, 18-34, 35-54, 55-74, and 75+. We want to obtain a version of the obesity dataset we can merge with education_long above; so both datasets would have to have one row for every zip-sex combination. This means we need to summarize() the obesity dataset to obtain one row per zip-sex combination, purposefully “smoothing over” all the age group detail we currently have. The dataset we are trying to obtain will have the following first few rows:Start from obesity , Remove all rows with data about the youngest age group (05-17), since the education_long dataset is only about adults, so that would be an unfair comparison. Within each row, calculate obese_percent by taking obese/n . ( n is the number of people we have data for; pop is the total population, so obese/n is just an estimate) Within each row, calculate obese_pop_estimate by taking obese_percent * pop . Group and summarize the data, such that you obtain one row per zip-sex combination (like in education_long ), with the information: Total population within that zip-sex combination: call this total_pop Total obese population (estimated) within that zip-sex combination: call this total_obese_pop_estimate . Note that you will have to use na.rm = T when summarizing to avoid NAs. Remove all rows where there are 0 estimated obese individuals. (Bonus question: Why didn’t we just do this at the beginning? Answer at the bottom of the question.) # Write your code here! Answer to bonus question: We needed to know the pop from every single row to calculate total population of a zip-sex combo, even if there were no obese people in that row. If we would have cleared out the 0 obese rows early, we would have lost some population. Problem 5 Create a new dataframe called joined by joining education_long and obesity_summary (note that their rows should match by zip and sex together) and only keeping rows that match in both dataframes. Your first four rows of joined should look like this: zip sex pct_bachelors total_pop total_obese_pop_estimate 1 53001 male 13 755 330. 2 53001 female 23 726 259. 3 53002 male 16.2 1052 404. 4 53002 female 25.4 977 369. Hint: Just like we can group_by(zip, sex) , we can join_by(zip, sex) . # Write your code here! Problem 6 Using joined from the previous problem, Create a scatter plot of points with pct_bachelors on the x-axis and percentage obese on the y axis. You will have to calculate percentage obese for this problem.Problem 7 Use joined to compute: The percentage of the entire Wisconsin adult population with a bachelor’s degree The percentage of the entire Wisconsin adult population with obesity Let these values print as output. Hint: You’ll need to compute some other quantities along the way, including the number of people with a bachelors in each zip-sex, and the total Wisconsin adult population.It is okay if one percentage is on the 0-1 scale and the other is on the 0-100 scale. You should get the following percentages:Problem 8
Preliminaries This file should be in STAT240/homework/hw03 on your local computer. Problem 1 The columns mass and radius are relative to Earth. A mass or radius of 2.0 refers to a planet twice the mass or radius of Earth. A mass or radius of 0.5 refers to a planet half the mass or radius of Earth. Starting from planets ,Problem 2 Starting from planets : Keep only planets discovered by the method “Radial Velocity”. Keep only planets whose mass and radius are both known. (e.g. they are not missing.) Group and summarize that data, such that: You get one row per year. Create the columns n_discovered and minimum_mass , which contain the number of planets discovered in that year, and the smallest mass among planets discovered in that year. Save this dataframe with the name mass_by_year . Then, on a separate line, let mass_by_year print as output so we can see the first ten rows in your knitted file (no need to use print to show the whole thing, we only need ten rows.) # Write your code here! Your first two rows should look like this if you did it correctly. (Column order is arbitrary, doesn’t matter if n_discovered and minimum_mass are switched around, just match the values.) year n_discovered minimum_mass 1 1999 1 232. 2 2001 1 1392. Problem 3Problem 4 Starting from planets , the original dataframe from the top of the file, Print out the planet name, mass, radius, and density of the top five most dense planets. To do so, you will have to calculate the density of each planet first, and then find the top five by density. The density of a planet is its mass divided by its volume.Note: This question requires you to understand the request and figure out which commands to chain together. Previous questions have indicated the step by step process, it is intentionally left out of this question and some future ones. Problem 5 Which star or stars have the most planets orbiting them in this dataset? How many planets are orbiting that star or stars? To answer this question, start from planets , then create and print a dataframe with columns star and n , with n representing how many planets are orbiting that star . # Write your code here! Problem 6 Problems 6 – 8 take you through a relatively complex analysis -> visualization process, which mimics what you might provide to a client asking the question: “How has the most popular method of planet discovery changed over time?”Problem 7 Starting from methods_within_year from problem 6 above, Grouping by year, add a column called yearTotal . yearTotal should indicate how many planets were discovered within that year across all methods. Here’s the first four rows to check your work against: year method n yearTotal 1 2000 Radial Velocity 16 16 2 2001 Radial Velocity 12 12 3 2002 Radial Velocity 28 29 4 2002 Transit 1 29 Now, add another column called methodProportion , which determines what percentage of the discoveries within that year were by that method. Here’s what methodProportion should look like, again, column order doesn’t matter: year method n yearTotal methodProportion 1 2000 Radial Velocity 16 16 1 2 2001 Radial Velocity 12 12 1 3 2002 Radial Velocity 28 29 0.966 4 2002 Transit 1 29 0.0345 Save this dataframe with the name methods_within_year_proportions , and then let methods_within_year_proportions be printed as output so the first ten rows are visible in your .html file. # Write your code here! Problem 8Now, let’s answer the client’s question based on the graph. “How has the most popular method of planet discovery changed over time?” Replace this text with your response.
Preliminaries This file should be in STAT240/homework/hw02 on your local computer. If you want the barplot from problem 2to render in the knitted file, download exampleBarplot.png to that folder as well. While you should create your answers in the .Rmd file, homework problems are better formatted and therefore easier to read in the .html file. We recommended frequently knitting and switching between the two as you read and solve problems. Problem 1Problem 2Check the knitted .html file or the exampleBarplot.png file itself. For the image to render in your knitted file, download exampleBarplot.png next to this .Rmd. Your friend’s attempt below is producing an error.Fix your friend’s error by editing the above chunk (or create another one). Then, below, explain why the original code was wrong and why the fixed code works. Replace this text with your response. Problem 3Replace this text with your response. Problem 4 will suffice.) Replace this text with your response. Replace this text with your response. c) Consider encountering this graph in a published news paper. Name one specific thing this graph does well to accomplish its goal, andwhat specific part of the code is responsible for it. Replace this text with your response. d) In the same context as c), name one specific thing that could be improved about this graph, and describe what code you wouldadd/edit to make it happen. (No need to formally write and run the code, just a description.) Replace this text with your response. Problem 5Problem 6Problem 7Problem 8 Compare the resulting graphs from problems 5, 6, and 7. Consider trying to determine in what time period Lake Mendota was freezing, on average, longer than Lake Monona. Which of the three graphs do you prefer for this determination and why? Replace this text with your response
Preliminaries This file should be in STAT240/homework/hw01 on your local computer. While you should create your answers in the .Rmd file, homework problems are better formatted and therefore easier to read in the .html file. We recommended frequently knitting and switching between the two as you read and solve problems. Problem 1 Consider a vector a which has three numeric elements, a vector b which has two numeric elements, and a single number on its own, c . For each of the following operations, write down how many elements long the output will be. We encourage you to think through the question first and make an initial guess, then check your work in the console. This command will return _ numeric element(s). This command will return _ numeric element(s). This command will return _ numeric element(s). This command will return _ numeric element(s). This command will return _ numeric element(s). Problem 2Problem 3 Compute the mean of firstFifty . Then, create a new vector which is the result of subtracting that mean from each value in firstFifty . Finally, take the sum of that resulting vector. Let that sum be printed as output, do not save it as a variable.Note: This result is not just a special case with these specific numbers, this process will return that sum for any vector! Problem 4 Compute the standard deviation of firstFifty without using the sd command. Let this single number be printed as output. To compute the standard deviation, execute the following steps in this order: Compute the mean of firstFifty and subtract it from every element in firstFifty , as in the previous question. Square every element in that vector. Take the sum of that new squared vector; this should be close to 10,400. Divide that number by (the length of the vector minus 1). You can obtain the length of firstFifty with length(firstFifty) . Don’t forget parentheses & order of operations! Finally, take the square root ( sqrt() ) of the number you got from the previous step. The mathematical notation for this is:Problem 5 We have provided you below with a vector containing all of the prime numbers from 1 to 50. # We have written this code for you, do not edit it primes = c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47) Create a vector of logical elements – that is, TRUE s and FALSE s; which indicates whether each of the elements in firstFifty is a prime number. Let this vector be printed as output, do not save it as a variable.Problem 6Copy a sentence from the help page here. 6.B. What do toupper and tolower do to non-alphabetic characters? (Hint: Try using the “Find in Topic” box in the top bar of the help page, or click on the help page and use Ctrl+F on Windows/Command+F on Mac!) Copy a sentence from the help page here. Problem 7 Create a dataframe using the tibble() command with five columns: The first column should be called char , and contain the 26 values of letters , The second column should be called pos , and contain the whole number values 1, 2, …, 25, 26. The third column should be called isVowel , and should contain 26 logical values according to whether char in that row is one of “a”, “e”, “i”, “o”, or “u” or not. We won’t consider “y” as a vowel for simplicity. The fourth column should be called isPrime , and should contain 26 logical values according to whether pos in that row is a prime number (or not). The fifth column should be called vowelAndPrime , and should contain 26 logical values; TRUE when that row has both a vowel and a prime number, and FALSE otherwise. Hint: Use & ! Save this dataframe in a new variable called alphanumeric . # Write your code here! Problem 8Looking at this output, Which vowel(s) are in a prime number position? In other words, in which rows is vowelAndPrime TRUE? List the vowels that are in a prime number position here. Submission Once you have finished all the problems above, knit this file and submit BOTH the .Rmd and .html files on Canvas!
INSTRUCTIONS• The homework will be peer-graded. In analytics modeling, there are often lots of different approaches that work well, and I want you to see not just your own, but also others. • The homework grading scale reflects the fact that the primary purpose of homework is learning:Rating Meaning Point value (out of 100) 4 All correct (perhaps except a few details) with a deeper solution than expected 100 3 Most or all correct 90 2 Not correct, but a reasonable attempt 75 1 Not correct, insufficient effort 50 0 Not submitted 0Question 19.1Describe analytics models and data that could be used to make good recommendations to the retailer. How much shelf space should the company have, to maximize their sales or their profit?Of course, there are some restrictions – for each product type, the retailer imposed a minimum amount of shelf space required, and a maximum amount that can be devoted; and of course, the physical size of each store means there’s a total amount of shelf space that has to be used. But the key is the division of that shelf space among the product types.For the purposes of this case, I want you to ignore other factors – for example, don’t worry about promotions for certain products, and don’t consider the fact that some companies pay stores to get more shelf space. Just think about the basic question asked by the retailer, and how you could use analytics to address it.As part of your answer, I’d like you to think about how to measure the effects. How will you estimate the extra sales the company might get with different amounts of shelf space – and, for that matter, how will you determine whether the effect really exists at all? Maybe the retailer’s hypotheses are not all true – can you use analytics to check?Think about the problem and your approach. Then talk about it with other learners, and share and combine your ideas. And then, put your approaches up on the discussion forum, and give feedback and suggestions to each other.You can use the {given, use, to} format to guide the discussions: Given {data}, use {model} to {result}.One of the key issues in this case will be data – in this case, thinking about the data might be harder than thinking about the models.
INSTRUCTIONS• The homework will be peer-graded. In analytics modeling, there are often lots of different approaches that work well, and I want you to see not just your own, but also others. • The homework grading scale reflects the fact that the primary purpose of homework is learning:Rating Meaning Point value (out of 100) 4 All correct (perhaps except a few details) with a deeper solution than expected 100 3 Most or all correct 90 2 Not correct, but a reasonable attempt 75 1 Not correct, insufficient effort 50 0 Not submitted 0Question 18.1Describe analytics models and data that could be used to make good recommendations to the power company.Here are some questions to consider: • The bottom-line question is which shutoffs should be done each month, given the capacity constraints. One consideration is that some of the capacity – the workers’ time – is taken up by travel, so maybe the shutoffs can be scheduled in a way that increases the number of them that can be done. • Not every shutoff is equal. Some shutoffs shouldn’t be done at all, because if the power is left on, those people are likely to pay the bill eventually. How can you identify which shutoffs should or shouldn’t be done? And among the ones to shut off, how should they be prioritized?Think about the problem and your approach. Then talk about it with other learners, and share and combine your ideas. And then, put your approaches up on the discussion forum, and give feedback and suggestions to each other.You can use the {given, use, to} format to guide the discussions: Given {data}, use {model} to {result}.Have fun! Taking a real problem, and thinking through the modeling and data process to build a good solution framework, is my favorite part of analytics.
INSTRUCTIONS• The homework will be peer-graded. In analytics modeling, there are often lots of different approaches that work well, and I want you to see not just your own, but also others. • The homework grading scale reflects the fact that the primary purpose of homework is learning:Rating Meaning Point value (out of 100) 4 All correct (perhaps except a few details) with a deeper solution than expected 100 3 Most or all correct 90 2 Not correct, but a reasonable attempt 75 1 Not correct, insufficient effort 50 0 Not submitted 0Question 15.2In the videos, we saw the “diet problem”. (The diet problem is one of the first large-scale optimization problems to be studied in practice. Back in the 1930’s and 40’s, the Army wanted to meet the nutritional requirements of its soldiers while minimizing the cost.) In this homework you get to solve a diet problem with real data. The data is given in the file diet.xls.1. Formulate an optimization model (a linear program) to find the cheapest diet that satisfies the maximum and minimum daily nutrition constraints, and solve it using PuLP. Turn in your code and the solution. (The optimal solution should be a diet of air-popped popcorn, poached eggs, oranges, raw iceberg lettuce, raw celery, and frozen broccoli. UGH!) 2. Please add to your model the following constraints (which might require adding more variables) and solve the new model: a. If a food is selected, then a minimum of 1/10 serving must be chosen. (Hint: now you will need two variables for each food i: whether it is chosen, and how much is part of the diet. You’ll also need to write a constraint to link them.) b. Many people dislike celery and frozen broccoli. So at most one, but not both, can be selected. c. To get day-to-day variety in protein, at least 3 kinds of meat/poultry/fish/eggs must be selected. [If something is ambiguous (e.g., should bean-and-bacon soup be considered meat?), just call it whatever you think is appropriate – I want you to learn how to write this type of constraint, but I don’t really care whether we agree on how to classify foods!]If you want to see what a more full-sized problem would look like, try solving your models for the file diet_large.xls, which is a low-cholesterol diet model (rather than minimizing cost, the goal is to minimize cholesterol intake). I don’t know anyone who’d want to eat this diet – the optimal solution includes dried chrysanthemum garland, raw beluga whale flipper, freeze-dried parsley, etc. – which shows why it’s necessary to add additional constraints beyond the basic ones we saw in the video! [Note: there are many optimal solutions, all with zero cholesterol, so you might get a different one. It probably won’t be much more appetizing than mine.]