Purpose The purpose of this assignment is to experiment with the behavior of transactions using a MariaDB database. You will need to use more than one MariaDB session for these to work. To do this, just open up two separate putty sessions to turing or hopper and log into MariaDB on each of them. Use the same name for the output file for all of the sessions – T assign9out.txt – so that the output from all of your sessions ends up in the same file. When znnnnnnn is used, replace it with your own z-id. Part I – The Power of COMMIT (25pts) 1) Start your first MariaDB seesion, issue the following SQL queries: T assign9out.txt USE znnnnnnn; CREATE TABLE Fall( pk INT PRIMARY KEY, data CHAR(15)); START TRANSACTION; INSERT INTO Fall VALUES(1, ‘dataA’); INSERT INTO Fall VALUES(2, ‘dataB’); INSERT INTO Fall VALUES(3, ‘dataC’); 2) Start your second MariaDB session, and run the following SQL queries in it. T assign9out.txt USE znnnnnnn; SELECT * from Fall; Question 1.2) What is the result of running the SELECT statement. Why? 3) In that second session, run the following: INSERT INTO FalI VALUES(4, ‘dataD’); INSERT INTO Fall VALUES(5, ‘dataE’); 4) Switching back to the first MariaDB session, issue the following queries: COMMIT; SELECT * FROM Fall; t exit; 5) Switch back to the second MariaDB instance, and run the following queries: SELECT * FROM Fall; t exit; Question 1.5) What is the result of the SELECT statement above? CSCI 466 Assignment 9 2 of 3 Part II — The Power of ROLLBACK (25pts) 1) Start another MariaDB session, issue following MariaDB statements: T assign9out.txt USE znnnnnnn; START TRANSACTION; DELETE FROM Fall WHERE pk = 3; SELECT * FROM fall; 2) Then UPDATE Fall SET Data = ‘changed’ WHERE pk = 2; 3) Then UPDATE Fall SET Data = ‘changed 2’ WHERE pk = 4; 4) Then INSERT INTO Fall VALUES(6, ‘dataF’); SELECT * FROM Spring; Question 2.4) What is the result of the SELECT statement, and why? 5) Issue the following MariaDB statements: ROLLBACK; SELECT * FROM Fall; Question 2.5) What is the result of the SELECT statement, and why? t exit; Part III: Be Aware of Deadlock (25pts) Using another two sessions of MariaDB, do the following in the order specified: 1) In session 1, T assign9out.txt USE znnnnnn; START TRANSACTION; 2) In session 2, T assign9out.txt USE znnnnnn; START TRANSACTION; 3) In session 1, CSCI 466 Assignment 9 3 of 3 UPDATE Fall SET data = ‘data1A’ WHERE pk=1; 4) In session 2, UPDATE Fall SET data= ‘data2B’ WHERE pk = 2; 5) In session 1, UPDATE Fall SET data = ‘data5E’ WHERE pk = 5; 6) In session 2, UPDATE Fall SET DAta = ‘data12B’ WHERE pk = 1; Question 3) What happened here? Notes There may be typos in these statements. If there is a syntax error, fix it. These errors were made to encourage you to pay close attention, and to type the statements in yourself as practice. What to turn in? Turn in, via Blackboard, the following: • The output generated by running the commands in both of the sessions. • The answers to the questions asked, (1.2, 1.5, 2.4, 2.5, and 3), in a separate text file.
The Task For this assignment, you will be writing a C/C++ program that prints reports on the books found in the henrybooks database from our MariaDB server. Requirements Your program will show a menu with three options for reports to show. The user will choose an option, and your program will use the MariaDB C API to run the necessary queries, displaying the results, neatly formatted, in the terminal. 1. Book List: For each book, print the title, the author(s), and the cost. Do not assume that a book will be written by a single author. Use a subquery to accomplish this. Sort the authors for each book based on their Sequence. 2. Author Search: Prompt the user for the name of an Author. Print a report showing all books by authors whose first or last names match the user-supplied name. For each of the books found, show a line in your output with the book code, the title, the name of the author that matched your search, and the price; then show individual lines (indented below the original line) for each branch, each showing how many of the current book are on hand, or a single line indicating that it is out of stock everywhere if no branches have any on hand. Branches should be displayed as their names and not as their numerical identifiers, in alphabetical order. 3. Title Search: Same as the author search in feature 2, but search by title instead of by author. For the author field, use the first author only (by sequence number). There should also be an option to quit. After performing whichever option is chosen (obviously except for the one to quit), your program should show the menu again and allow the user to choose a new option to run. Notes • The grading for this will be done by compiling and running this program on turing and/or hopper. If your program does not compile and run properly on turing and hopper, you will not receive credit, so make sure to test it there. • If your program needs any flags other than the ones used in the examples of gcc/g++ in the slides in order to compile, you need to let us know when you submit. • Don’t forget to document your code. • It is not a requirement that every feature be implemented with a single query. You can break the tasks down into smaller queries if you find it makes solving the problem easier. • As always, you should be submitting your own work. Do not try to cheat or plagiarize other people’s programs. What to turn in? Submit, through Blackboard, the following: • Your C++ source code, in a file called books.cc. • Any other source code or header files your program needs in order to compile. CSCI 466 Assignment 10 2 of 2 Schema for henrybooks For your convenience, I have included information on the schema of henrybooks below: Table Field Type Null Key Default Extra Author AuthorNum decimal(2,0) NO PRI NULL AuthorLast char(12) YES NULL AuthorFirst char(10) YES NULL Book BookCode char(4) NO PRI NULL Title char(40) YES NULL PublisherCode char(3) YES NULL Type char(3) YES NULL Price decimal(8,2) YES NULL Paperback char(1) YES NULL Branch BranchNum decimal(2,0) NO PRI NULL BranchName char(50) YES NULL BranchLocation char(50) YES NULL NumEmployees decimal(2,0) YES NULL Inventory BookCode char(4) NO PRI BranchNum decimal(2,0) NO PRI 0 OnHand decimal(2,0) YES NULL Publisher PublisherCode char(3) NO PRI NULL PublisherName char(25) YES NULL City char(20) YES NULL Wrote BookCode char(4) NO PRI AuthorNum decimal(2,0) NO PRI 0 Sequence decimal(2,0) YES NULL
Introduction For this assignment, you must write an SQL script. An SQL script is a text file that contains a sequence of SQL commands to be run. Yours should include commands for each of the following, in order: 1. Run a statement that will remove all of the tables that will be created below. This will allow the script to be run again without any errors caused by existing tables. 2. Create a table called dogs with a dog id, a breed, and a name. The id of the dog should be the primary key, and should be automatically assigned the next available value when inserting a new row into the table. 3. Put five rows into the dog table with example data. Make up the data yourself. 4. Run the command DESCRIBE dogs; 5. Run the command SELECT * FROM dogs; 6. Create a table called visits that contains a visit id as primary key (this should take the next available key value when a new row is added). It should also have a foreign key that references a row in the dog table, and an attribute to hold the date that the visit took place. 7. Put at least eight new rows into the visits table. Since there are only five dogs, this means that some dogs will have multiple visits. 8. Run the command DESCRIBE visits; 9. Run the command SELECT * FROM visits; 10. Add a column to the visits table to hold the time of the visit. 11. Change the value for the newly-added attribute in several of the existing rows. There are several date/time functions available in SQL, and you can choose to use any that are appropriate. 12. Run the command SELECT * FROM visits; again. If a data type is not specified in this document, anything reasonable should be fine. Use your discretion. You should include comments in the script that identify which commands match which of the requirements. Comments in SQL are written with a — before them. What to turn in? Submit, through Blackboard, the following: 1. A text file containing the script you wrote that accomplishes the above. It should be suitable for running with the . command in the SQL client. 2. Another text file containing the output generated after running the script your wrote. You can use T filename to start logging to a file called filename, and t to stop logging when you are done. It is also possible, though not recommended, to get the output by copying and pasting from the terminal. Make sure to choose names for these two files that make it easy for the TA to determine which is which without having to open the file. 1
Introduction You will be writing another SQL script. The script should work with the BabyName database, so it will need the appropriate USE statement. Test each of the SQL statements required for the below, and then put them into the script, making sure to comment each one. Include your name, zid, and section in each of the files you submit. This is a fairly large database, so it is recommended that you add LIMIT 50 to your SQL statements when testing, to prevent a nigh-neverending cascade of text from needing to be downloaded. Your script must do each of the following: 1. Select the BabyName database. 2. Show all of the tables in the database. 3. Show all of the columns and their types for each table in the database. 4. Show all of the years (once only) where your first name appears. Some people’s names may not be present in the database. If your name is one of those, then use ‘Chad’ if you are male, or ‘Stacy’ if you are female. If you don’t feel you fit into one of those, feel free to use ‘Pat’. 5. Show all of the names from your birth year. (Showing each name only once.) 6. Display the most popular male and female names from the year of your birth. 7. Show all the information available about names similar to your name (or the one you adopted from above), sorted in alphabetical order by name, then, within that, by count, and finally, by the year. 8. Show how many rows there are in the table. 9. Show how many names there were in the year 1972. 10. Show how many names are in the table for each sex from the year 1953. 11. List how many different names there are for each place. What to turn in? Submit, through Blackboard, the following: 1. The SQL script that accomplishes the above task. 2. The output yielded by having MariaDB run the script, in a text file.
Introduction For this assignment, you will be converting the following ER diagram, which models the operational data for a department store, to a set of relations that conform to Third Normal Form (3NF). Use the steps from the slides that we went over in class. Make sure to indicate the primary keys by underlining them. Include a list of which attributes are foreign keys, with the home relation of each foreign key indicated. ER Diagram Store Store_ID Address Manager City City_Name State HeadQuartersAddr Item Item_ID Description Size Color Customer Customer_Name Address Phone_Number Order Order_Num Date located in (1,n) (1,1) stored in (1,n) (1,n) Quantity ordered in (1,n) (1,n) Quantity placed (1,1) (1,n) hold (1,n) (1,n) Quantity Figure 1: This is the ER diagram. There may be some things that may seem weird. These may be errors that were missed during the design phase, or they could be intentional. The purpose of this assignment is to convert it, not to revise it. What to turn in? Submit a PDF file through Blackboard containing the relational schema for the database, including the information required above.
The Goal You are being employed by a company that offers a fitness tracking service. They are working on a phone app that will allow the user to track what they eat, as well as when/how they work out. Another employee will be designing the user interface, but you are responsible for designing the database. Design an ER diagram to fulfill this goal, making sure to meet all of the requirements. All entities must have an appropriate identifier specified. If a surrogate key is used, explain why a natural key was not appropriate. In the interests of saving space, attributes that are not part of an identifier may be omitted from the diagram, but they should be included and explained in that portion of your submission. (See “What to turn in?” below.) Requirements • Every user will have an account, and all of their meals and workouts will be linked to this account. • To track weight loss, the user will update their weight periodically. This data must be retained. • The serving size will be some number of units (grams, ounces, Tbsp, cups, lbs, etc.). There will be information stored for conversion between different unit types. • There will need to be a database of foods/beverages. Each of these will have information on serving size, calories per serving, and grams per serving of each of the macronutrients (protein, fat, carbohydrates). • It should be possible to store information on the quantities of micronutrients or chemicals (i.e. vitamin D, caffeine) that are present in a given food or beverage in significant amounts. Recommended daily values for any of these should be stored, when applicable. • Each time the user eats, a record of who ate how many servings of what and when is stored. • The app needs to allow a user to track their workouts. This includes the type of the workout, its intensity, and its duration. • When a user tracks their workout, a record is created of who did what type of workout, when, and for how long. Note: If you are not sure what needs to be included for foods/drinks/etc., a good place to look for example data would be the nutritional facts on food packaging. Necessary Views The database needs to be able to store its operational data in such a way that the final app will be able to show the following views (at minimum). You don’t need to implement the views, but the necessary data should be modeled. • A graph will be generated of how many calories a user consumed each day of the week. • A similar graph will be generated that shows how many calories were burnt each day of the week through workout. • Show a pie graph of the percentage of the diet made up of each macronutrient during a given time period (day, week, month). • Track the consumption of a given micronutrient/chemical over a specified period of time. If there is a recommended daily value, show a comparison of their consumption with the recommended amount. • Allow the user to search through the food database to find common foods, in order to plan their diets. These same foods will be used to track their eating. • The user will search through a list of workouts to find the one closest to the one they’re going to do, in order to track the activity. • A line graph of user weight over time. CSCI 466 Assignment 1 2 of 2 What to turn in? You should turn in, via Blackboard, the following: • A PDF file containing the diagram you’ve drawn. You can draw this in any graphics program. If you must, you may draw it by hand and scan it, but it must be legible. Credit will not be given for portions of the diagram that cannot be read. • A text file or another PDF that explains all entities and relationships you’ve chosen to use, along with a description of each of their attributes. List any assumptions that you will have made when deciding cardinalities that are not specified explicitly by the requirements.
Project 1 SpecificationChallenge DescriptionFace parsing assigns pixel-wise labels for each semantic components, e.g., eyes, nose, mouth. The goal of this mini challenge is to design and train a face parsing network. We will use the data from the CelebAMask-HQ Dataset [1] (See Figure 1). For this challenge, we prepared a mini-dataset, which consists of 1000 training and 100 validation pairs of images, where both images and annotations have a resolution of 512 x 512.The performance of the network will be evaluated based on the F-measure between the predicted masks and the ground truth of the test set (the ground truth of the test set will not be released).Figure 1. Sample images in CelebAMask-HQAssessment CriteriaWe will evaluate and rank the performance of your network model on our given 100 unseen test images based on the F-measure.The higher the rank of your solution, the higher the score you will receive. In general, scores will be awarded based on the Table below.Percentile in ranking≤ 5%≤ 15%≤ 30%≤ 50%≤ 75%≤ 100%*Scores2018161412100Notes:We will award bonus marks (up to 2 marks) if the solution is interesting or novel. Marks will be deducted for incomplete submissions, such as missing key code components, inconsistencies between predictions and code, significantly poorer results than the baseline, or failure to submit a short report.Submission Guideline Download dataset: this link Train and test your network using our provided training set. [Optional] Evaluate your model on an unseen CodaBench validation set during development, with up to 5 submissions per day and 60 in total. [Required] During test phase, submit your (1) test set predictions, (2) source code, and (3) pretrained models to CodaBench. The test set will be released one week before the deadline, following standard vision challenge practices. You are allowed up to 5 submissions per day, with a total limit of 5.Restrictions To maintain fairness, your model should contain fewer than 1,821,085 trainable parameters, which is 120% of the trainable parameters in SRResNet [2] (your baseline network). You can usesum(p.numel() for p in model.parameters())to compute the number of parameters in your network.No external data and pretrained models are allowed in this mini challenge. You are only allowed to train your models from scratch using the 1000 image pairs in our given training dataset. You should not use an ensemble of models.Step-by-step Submission ProcedureWe host the validation and test sets on CodaBench. Please follow the guidelines to ensure your results to be recorded.The website of the competition is https://www.codabench.org/competitions/5726Register the CodaBench account with your NTU email (ends with @e.ntu.edu.sg), with your matric number as your username. Register for this competition and waits for approval.Submit a file with your prediction results as follows. Include source code and pretrained models in the test phase; not required for the dev phase.IMPORANT NOTE Please refer “Get Started → Submission” on the CodaBench page to for the file structure of your submission. Please adhere to the required file structure. Submissions that do not follow the structure cannot be properly evaluated, which may affect your final marks.If your submission status is “failed”, check the error logs to identify the issue. The evaluation process may take a few minutes.Submit the following (zipped) files to NTU Learn before the deadline. A short PDF report (max five A4 pages, Arial 10 font) detailing your model, loss functions, and any processing steps used to obtain results. Include the Fmeasure on the test set and the total number of model parameters. Name your report as: [YOUR_NAME]_[MATRIC_NO]_[project_1].pdf A screenshot from the CodaLab leaderboard, with your username and best score. We will use the score from CodaLab for marking, but will keep your screenshot here for double-check reference.Computational ResourceYou can use the computational resources assigned by the MSAI course. Alternatively, you can use Google CoLab for computationReferences Cheng-Han Lee, Ziwei Liu, Lingyun Wu, Ping Luo, MaskGAN: Towards Diverse andInteractive Facial Image Manipulation, CVPR 2020Ledig et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, CVPR 2017
IMPORTANT NOTES:This project is based on the topic of distributed systems security that is covered in Modules 11 and 12. The goal of the project is to gain hands-on experience in implementing secure distributed services. You will develop a simple Secure Shared Store (3S) service that allows for the storage and retrieval of documents created by multiple users who access the documents at their local machines. In the implementation, the system should consist of one or more 3S client nodes and a single server that stores the documents.Users should be able to login to the 3S server through any client by providing their private key as discussed in Module 12. Session tokens would be generated upon successful authentication of the users. They can then check-in, checkout and delete documents as allowed by access control policies defined by the owner of the document.To implement such a distributed system, we will need to make use of certificates to secure communication between clients and the server, and to authenticate sources of requests. You will need to make use of a Certificate Authority (CA) that generates certificates for users, client nodes and the server. All nodes trust the CA. We have provided a Virtual Machine for the project. Links to download the image (.ova file) will be posted on Ed Discussion.The default account on the VM is cs6238 and the password is cs6238. The root password is also cs6238. In an ideal setting, the 3S server and the client would be on separate nodes. For simplicity, we have set up only one VM. The server and client nodes are abstracted as separate folders within the VM. For example, the server folder represents the server and the client1 folder represents the client node.The desktop contains a Project4 folder which has the skeletal implementation of the 3S service. You will be required to complete the implementation to satisfy all the functionalities which will be detailed below. The Project4 folder contains: Fig: Folder structure of Project4 As discussed above, we will need to make use of a Certificate Authority that is trusted by all nodes. This CA would be used to generate certificates for the users, client nodes and the server. One can make use of a library such as OpenSSL for setting up the CA and to generate certificates.For this project, we have created a CA. This CA has been used to generate certificates for the server. You would be required to generate certificates for the client nodes using this CA. The CA (certificate and key) was generated using the password (passphrase) cs6238.Detailed instructions on generating certificates are present in Appendix A.When the client keys and certificates are created, they should be placed in the clientX/certs folder and should be named as clientX.key and clientX.crt After a 3S server starts, a client node can make requests to the server. Let’s assume that client nodes have a discovery service that allows them to find the hostname where 3S runs. The hostname, in this case, is secureshared-store. The certificate for the server contains secure-shared-store as the common name of the server. Whenever the client node makes a request, mutual authentication is performed, and a secure communication channel is established between the client node and the server. Here we make use of nginx to perform mutual authentication (MTLS). Every request from the client node should include the certificate of the client node for authentication.As mentioned before, the 3S service should enable functions such as login, checkin, checkout, grant, delete, and logout. You will have to complete the skeleton code provided for the server and client to achieve these functionalities. Details are as follows: When the Security Flag is set as Confidentiality (to be represented by “1”), the server generates a random AES key for the document, uses it for encryption and stores data in the encrypted form. To decrypt the data at a later time, this key is also encrypted using the server’s public key and stored with document meta-data. When theSecurity Flag is set as Integrity (to be represented by “2”), the server stores the document along with a signed copy. When a request is made for a document stored with Confidentiality as the SecurityFlag, the server locates the encrypted document and its key, decrypts the data and sends it back over the secure channel. Similarly, when a request is made for a document stored with Integrity as the SecurityFlag, the signature of the document must be verified before sending a copy to the client.Additionally, when a request is made to checkin a document that is checked out in the current active session, the client must move (not copy) the document from the “/documents/checkout” folder into the“/documents/checkin” folder. The client implementation must handle the transfer of these files between the folders automatically. ta Grant can only be issued by the owner of the document. b This will change the defined access control policy to allow the target user (TUID) to have authorization for the specified action (R) for the specified document (DID).c AccessRight R can either be:i checkin (which must be represented by input 1) ii checkout (which must be represented by input 2) iii both (which must be represented by input 3)for time duration T (in seconds). If the TargetUser is ALL (TUID=0), the authorization is granted to all the users in the system for this specific document. If there are multiple grants that have been authorized for a particular document and user, the latest grant would be the effective rule. Basically, the latest grant for the tuple (DID, TUID) should persist.Here are a few clarification scenarios for Grant:− If an initial grant for (file1, user1, 2, 100) is successful and then a successful grant request (file1, 0, 1, 50) is made, then file1 should be accessible for checkin only to all users for 50 seconds. User1 loses the checkout access given earlier.− Grant (file1, 0, 3, 100) exists and then a successful grant request (file1, user2, 2, 50), then file1 is accessible to user2 for checkout for 50 seconds and invalidates the previous grant. Since this is a security class, you should use secure coding practices. You are also expected to use static code analysis tools such as Pylint, Pyflakes, etc. and minimize the use of unsafe function calls (justify any such calls you need to make by providing inline comments). The report should list tools used to ensure that your code does not have any vulnerabilities. The report should also discuss the threat model and what threats are handled by your implementation. Fig. Project Flow − How mutual authentication is achieved in the current implementation of 3S.− Details on the cryptographic libraries and functions used to handle secure file storage.− How the user information and metadata related to documents were stored.− Details of how the required functionalities were implemented− List any features that were not implemented or tested (partial points may be awarded).− List the assumptions made, if any.as client.py 4. Requirements Please ensure that you do not zip the files in your submission. Also, please stick to the specified naming conventions since an auto grader would be evaluating your submissions. IMPORTANT: Please ensure that you submit only these 4 files along with the video (See Video Requirements below) that are mentioned and follow the specified naming conventions. Any error in adhering to these guidelines would result in an error with the autograder and would result in a significant loss of points. IMPORTANT: Do not hardcode the public or private key names (eg: user1.key or user1.pub) in your code. Make sure the usernames and keys are all in lowercase only. Halfway through the project, if there are many common doubts, we will consolidate the clarification posts and share it as a note.As mentioned earlier, this project will be graded by an auto grader so please follow the guidelines mentioned in this file. However, there is an alternative solution if the auto grader fails for your submission due to any reason. This video (a screen recording) will be required to be submitted as part of your submission and will be then graded for partial credit (only if the auto grader fails). This must be added as a media comment on your submission and can be of any common video format. If you fail to submit the video, you’ll get a penalty of 10 points. The following steps will be required to be shown as a part of your video:The video should show the file locations and content. Try to show as many details about the functionality of the program as possible. Certificate Generation:The resource below describes how to set up a Certificate Authority (CA) and then how it’s certificate would be used to generate certificates for the nodes.We have already set up a CA. You can find the CA certificates in the CA folder of Project4. We have also generated the server keys and certificate (certname is secure-shared-store) using the CA certificate. Also, the following command was used to extract the public key from the certificate. openssl x509 -pubkey -noout -in secure-shared-store.crt > secure-shared-store.pub You can use the above resources to generate certificates and keys for the client nodes and users.
Purpose: Become more familiar with operating system interaction, threading, and race conditions. Points: 100 50 for program and 50 for write-up Scoring will include functionality, documentation, and coding style Assignment: In recreational number theory, a Narcissistic number1 is a number that is the sum of its own digits each raised to the power of the number of digits. For example, given 8202 which has 4 digits, the sum of each number raised to 4 is 8208. 8208 = 8 4 + 2 4 + 0 4 + 8 4 Write an assembly language program provide a count of the Narcissistic numbers between 0 and some limit. For example, between 1 and 10,000 there are exactly 17 Narcissistic numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 153, 370, 371, 407, 1634, 8208, 9474). In order to improve performance, the program should use threads to perform computations in parallel. The program should read the thread count and number limit in septenary from the command line in the following format; “./narCounter -t -lm ”. For example: ./narCounter -t 4 -l 564355 The following routines should be created: • A value returning function, getUserArgs(), that reads and verifies the command line arguments. This includes error and range checking based on provided parameters (in the template). The error strings are predefined in the template. The function should call the aSept2int() function. • Value returning function aSept2int() to convert an ASCII/septenary string to integer. If the passed string is a valid ASCII/septenary value the value should be returned (via reference) and the function should return true. If there is an error, the function should return false. Note, the integer being returned is a quad (64-bits) which is different from previous assignments. • A void function, narcissisticNumberCounter(), that will be called as a thread function and determine the count of narcissistic numbers. The thread must perform updates to the global variables in a critical section to avoid race conditions. See below sections for additional detail. 1 For more information, refer to: https://en.wikipedia.org/wiki/Narcissistic_number The Simpsons, Season 17 Episode 22 The first, 8,191 is a Mersenne prime, a prime number that can be divided only by itself and one. And, 8,128, is a perfect number, where the sum of proper divisors add up to it. Then, 8,208 is a narcissistic number, which has four digits and, if you multiply each one by itself four times, the result adds up to 8,208. Thread Function: Create a thread function, narcissisticNumberCounter() that performs the following operations: • Print the thread starting message (predefined). • Each thread will obtain the next bock of numbers to check (via global variable, currentIndex) and increment the global by BLOCK_SIZE (range of numbers to check) using the predefined constant. ◦ Loop while the next number is £ limitValue (globally available) to check each number in the block. ◦ If a number is narcissistic, atomically increment narcissisticCount variable. ◦ When the range of numbers has been processed, a new block should be obtained. ◦ If the limit value is exceeded the function should exit. When obtaining the next block to check and incrementing the global currentIndex variable, these operations must be performed as a critical section2 (i.e., locked and unlocked using the provided spinLock() and spinUnlock() functions). As the numbers are being checked, no locks are needed. Once a number has been determined to be narcissistic, the number should be placed in the narcissisticNumbers[] array and the narcissistic counter should incremented accordingly. These operations must also be performed as a critical section. It is recommended to complete and test the program initially with only the single sequential thread before testing the parallel threads. In order to debug, you may wish to temporarily insert a direct call (non-threaded) to the narcissisticNumberCounter() function. Results Write-Up When the program is working, complete additional timing and testing actions as follows; ● Use the provided script file to execute and time the working program. ● Compute the speed-up3 factor from the base sequential execution and the parallel execution times. Use the following formula to calculate the speed-up factor: SpeedUp = ExecTimesequential ExecTime parallel ● Remove the locking calls (the spinLock() and spinUnlock() calls and the ‘lock’) and re-execute the program using a limit of 50,000,000 (11446644117) for 1, 2, 3, and then 4 threads. A provided timing script will help capture these results. ○ The Unix time command (as per asst #11B) should be used to obtain the execution times. Use the “real” time. ● Create a final write-up including a copy of the program output from the timing script file and an explanation of the results. The explanation must address: ○ the final results for 1, 2, 3, and 4 thread executions, including final results (list of perfect/abundant/deficient numbers) and the execution times for both each. ○ the speed-up factor from 1 thread to 2, 3, and 4 threads (via the provided formula) ○ simple chart plotting the execution time (in seconds) vs thread count (see example below). ○ the difference with and without the locking calls for the parallel execution ■ explain specifically what caused the difference 2 For more information, refer to: https://en.wikipedia.org/wiki/Critical_section 3 For more information, refer to: https://en.wikipedia.org/wiki/Speedup The explanation part of the write-up (not including the output and timing data) should be less than ~300 words. Overly long explanations will be not be scored. Below is an example of the type of chart to include Such graphs are most easily done using a spreadsheet. An optional example spreadsheet is provided for reference. Assignment #12 Timing Script In order to obtain the times for the write-up a timing script is provided. After you download the script file, a12timer, set the execute permission as follows: ed-vm$ chmod +x a12timer ed-vm$ ./a12timer narCounter The script file will expect the name of the executable as an argument (as shown). The script may take a while to execute, but no interaction is required. The script will create an output file, a12times.txt, which will be included in the submission and the data be used create the writeup. 1 2 3 4 0 100 200 300 400 500 600 CS 218 – Assignment #12 Execution Time vs Thread Count Threads Times (seconds) Submission: • All source files must assemble and execute on Ubuntu and assemble with yasm. • Submission three (3) files ◦ Submit Source File ▪ Note, only the functions file (a12procs.asm) will be submitted. ◦ Submit Timing Script Output ▪ Submit the a12times.txt file (created after executing the a12timer script). ◦ Submit Write Up file (write_up.pdf) ▪ Includes system description, speed-up amounts, and results explanation (per above). • Once you submit, the system will score the code part of the project. ◦ If the code does not work, you can (and should) correct and resubmit. ◦ You can re-submit an unlimited number of times before the due date/time (at a maximum rate of 5 submissions per hour). • Late submissions will be accepted for a period of 24 hours after the due date/time for any given assignment. Late submissions will be subject to a ~2% reduction in points per an hour late. If you submit 1 minute – 1 hour late -2%, 1-2 hours late -4%, … , 23-24 hours late -50%. This means after 24 hours late submissions will receive an automatic 0. Program Header Block All source files must include your name, section number, assignment, NSHE number, and program description. The required format is as follows: ; Name: ; NSHE ID: ; Section:
Purpose: Become more familiar with operating system interaction, file input/output operations, and file I/O buffering. Points: 300 (grading will include functionality, documentation, and coding style) Assignment: Write a program that will create a thumbnail1 (small, 300×200 image). The program will read the source file and output file names from the command line. The program must perform basic error checking on the command line file names. For example, given an impage file, image.bmp, to create a thumbnail image named thumb.bmp, the command line might be as follows: ./makeThumb image.bmp thumb.bmp The required functions include the following: ● Function getImageFileNames() to read and verify the file names from the command line. The function should check the file names by attempting to open the files. If both files open, the function should return TRUE and the file descriptors. The function should verify the “.bmp” extension. If there is an error, the function should display an appropriate error message and return FALSE. ● Function setImageInfo() to read (original image file), verify, and write (thumbnail file) the header information. The function must check the signature, color depth, and bitmap size consistency. Refer to the BMP File Format section. If the original image file information is correct, the header should updated to reflect the height and width (passed) and the new smaller thumbnail file size (width * height * 3 + HEADER_SIZE). Once updated, the revised header can be written to the output file. Additionally, the original width and original height should be returned to the main (via reference). If all parameters were correct, the function should return TRUE. If there is an error, the function should display an appropriate error message (predefined) and return FALSE. ● Function readRow() should return one row of pixels. If a row can be returned, the function should return the row and return TRUE. If a row can not be returned (because there is no more data) the function should return FALSE. If there is a read error, the function should dispaly an appropriate error message (predefined) and return FALSE. To ensure overall efficiency, the function must perform buffered input with a buffer size of BUFF_SIZE (originally set to 750,000). Note, this will be changed for part B. 1 For more information, refer to: https://en.wikipedia.org/wiki/Thumbnail ● Void function writeRow() should write one row of pixels to the output file. If successful, the function should return TRUE. If there is a write error, the function should display an appropriate error message (predefined) and return FALSE. This function is not required to buffer the output. As such, each row may be written directly to the output file. Provided Main The provided main program calls the various functions. The provided main program must not be changed in any way. All your functions must be in a separate, independently assembled source file. Note, to help with the initial program testing, it might be best to temporarily skip (comment out) the calls to the image manipulation statements (in the main). Assemble and Linking Instructions You will be provided a main function (makeThumb.cpp) that calls the functions. Your functions should be in a separate file (a11procs.asm). The files will be assembled individually and linked together. When assembling, and linking the files for assignment #10, use the provided makefile to assemble, and link. Note, only the functions file, a11procs.asm, will be submitted. The submitted functions file will be assembled and linked with the provided main. As such, do not alter the provided main. Debugging -> Command Line Arguments When debugging a program that uses command line arguments, the command line arguments must be entered after the debugger has been started. The debugger is started normally (ddd ) and once the debugger comes up, the initial breakpoint can be set. Then, when you are ready to run the program, enter the command line arguments. This can be done either from the menu (Program -> Run) or on the GDB Console Window (at bottom) by typing run at the (gdb) prompt (bottom window). Buffering Since many image files can be very large, it would be difficult to pre-allocate enough memory to hold a complete large image. Further, that memory would be mostly unused for processing smaller images. To address this, the program will perform buffered input. Specifically, the program should read a full buffer of BUFF_SIZE bytes (originally set to 750,000). Note, this will be changed for part B. From that large buffer, the readRow() function would return one row, width × 3 bytes. The next call to the readRow() function would return the next row. As such, the readRow() function must keep track of where it is in the buffer. When the buffer is depleted, the readRow() function must re-fill the buffer by reading BUFF_SIZE bytes from the file. Only after the last row has been returned, should the readRow() function return a FALSE status. 24-bit Color Depth For 24-bit color depth, the pixel color is stored as three bytes, a red value (0-255), a green value (0- 255), and a blue value (0-255). The color values are stored in that order. Thus, each pixel is 3 bytes (or 24-bits). So, for a 800 pixel row, there are 2,400 bytes (800 * 3). The main sets a row value maximum as 5,000 pixels (or 15,000 bytes). The provided main will perform this verification and display an appropriate error message as required. BMP File Format2 The BMP format supports a wide range of different types, including possible compression and 16, 24, and 32 bit color depths. For our purposes, we will only support uncompressed, 24-bit color BMP files. Under these restrictions, the header (or initial information) in the file is a 138 byte block containing a series of data items as follows: Size Description 2 bytes Signature. Must be ”BM”. Note, must ensure is “BM” for this assignment. 4 bytes File size (in bytes). 4 bytes Reserved. 4 bytes Size of header. Varies based on compression and color depth. For an uncompressed, 24-bit color depth, the header is 54 bytes. 4 bytes Offset to start of image data in bytes. May vary based on compression and color depth. 4 bytes Image width (in pixels). 4 bytes Image height (in pixels). 2 bytes Number of planes in image (typically 1). 2 bytes Number of bits per pixel. Typically 16, 24, or 32. Note, must ensure is 24 for this assignment. 4 bytes Compression Type (0=none). Note, must ensure is 0 for this assignment. 4 bytes Size of image in bytes (may vary based on compression types). 100 bytes Miscellaneous (not used for uncompressed, 24-bit color depth). Since the remaining parts of the program will only work for uncompressed, 24-bit color depth, these must be verified before the program can perform image manipulations. Specifically, the following items should be verified: ● File signature is valid (must be “BM” for signature). ● Color depth is 24-bits. ● Image data is not compressed (i.e., compression type must be 0). ● File bitmap block size consistency (file size = size of image in bytes + header size). Appropriate error message strings are provided in the functions template. Once these are verified, the header information should be written to the output file. Example Executions: The following execution, which includes some errors, will read file image0.bmp, create the thumbnail image, and place the output image in a file named tmp.bmp. ed-vm% ./makeThumb Usage: ./makeThumb ed-vm% ed-vm% ./makeThumb img0.bp thm.bmp Error, invalid source file name. Must be ‘.bmp’ file. ed-vm% ./makeThumb none .bmp tmp.thm Error, too many command line arguments. 2 For more information, refer to: http://en.wikipedia.org/wiki/BMP_file_format ed-vm% ed-vm% ./makeThumb none.bmp none. bmp Error, too many command line arguments. ed-vm% ed-vm% ./makeThumb img1.bmp thm.bmpp Error, invalid output file name. Must be ‘.bmp’ file. ed-vm% ed-vm% ./makeThumb tmp.bmp Error, incomplete command line arguments. ed-vm% ed-vm% ./makeThumb img4.bmp tmpImg4.bmp ed-vm% ed-vm% ./makeThumb test/imgBad1.bmp tmp.bmp Error, unsupported color depth. Must be 24-bit color. ed-vm% ed-vm% ./makeThumb test/imgBad2.bmp tmp.bmp Error, only non-compressed images are supported. ed-vm% ed-vm% ./makeThumb test/imgBad3.bmp tmp.bmp Error, bitmap block size inconsistent. ed-vm% ed-vm% ./makeThumb test/imgBad0.bmp tmp.bmp Error, invalid file signature. ed-vm% Note, a series of image files, including img4.bmp, are provided. However, the program will work on any uncompressed, 24-bit BMP format file. Example Ouptut Images The following table provides some examples of the expected final output. Example Output Original Image img4.bmp Original Image Size: 5.8 MB Image thumbnail tmpImg4.bmp Original Image Size: 180.1 KB Submission: • All source files must assemble and execute on Ubuntu with yasm. • Submit source files ◦ Submit a copy of the program source file via the on-line submission. ◦ Note, only the functions file (a11procs.asm) will be submitted. • Once you submit, the system will score the project and provide feedback. ◦ If you do not get full score, you can (and should) correct and resubmit. ◦ You can re-submit an unlimited number of times before the due date/time (at a maximum rate of 5 submissions per hour). • Late submissions will be accepted for a period of 24 hours after the due date/time for any given assignment. Late submissions will be subject to a ~2% reduction in points per an hour late. If you submit 1 minute – 1 hour late -2%, 1-2 hours late -4%, … , 23-24 hours late -50%. This means after 24 hours late submissions will receive an automatic 0. Program Header Block All source files must include your name, section number, assignment, NSHE number, and program description. The required format is as follows: ; Name: ; NSHE ID: ; Section:
Purpose: Become more familiar with data representation, program control instructions, function handling, stacks, floating point operations, and operating system interaction. Points: 200 Assignment: Write a simple assembly language program using OpenGL to calculate and display the output from a series of provided functions. Use the provided main program which includes the appropriate OpenGL initializations. You will need to add your code (two functions) in the provided functions template. The program should read the draw speed, draw color, and picture size from the command line. The required format for the options is as follows: “-sp -cl -sz ” with a valid septenary number for each argument. For example: ed-vm% ./wheels -sp 2 -cl 262022046 -sz 2313 Note, the ed@vm% is the prompt on my machine. The program must verify the format, read the arguments, and ensure the arguments are valid. The program must ensure that the speed, color, and picture size within the provided ranges. If there are any input errors, the program should display an appropriate error message and terminate. Refer to the sample executions for examples. Submission: All functions must follow the standard calling convention as discussed in class. The functions for the command line arguments and drawing function must be in a separate assembly source file from the provided main program. The provided main program should be linked with the functions. Only the functions file will be submitted. As such, the main file can not be altered in any way. Functions The provided main program calls the following routines: ● Function getParams() to process and check the command line arguments for the picture width and picture height. The function should read each value, convert the ASCII/septenary to integer and verify the range. If all is correct, return the values (via reference) and return to the main with a return value of TRUE. If there are any errors, display the appropriate error message and return to the main with a return value of FALSE. ● Function drawWheels() to plot the below functions. The formulas should be iterated and each will generate an (x,y) value pair each must be plotted with the openGL call. The t value should range between 0.0 and 2 p increment by tStep (predefined) each iteration. x1 = cos(t) y 1 = sin (t) x2 = cos(t) 3 + 2cos(2p s) 3 y 2 = sin(t) 3 + 2 sin(2 p s) 3 x3 = 2cos(2 p s) 3 + t cos( 4 p s) 6 p y 3 = 2sin (2p s) 3 − t sin( 4 p s) 6 p x4 = 2cos(2p s) 3 + t cos( 4 p s + 2p 3 ) 6 p y 4 = 2sin(2 p s) 3 − t sin( 4p s + 2p 3 ) 6 p x5 = 2cos(2 p s) 3 + t cos( 4p s − 2 p 3 ) 6p y 5 = 2sin (2p s) 3 − t sin( 4 p s − 2p 3 ) 6p Before leaving the function, the s value should be incremented by sStep which must be set as follows; sStep = speed scale The template already includes some of the applicable openGL initialization calls (which must not be removed). OpenGL Installation For this assignment, we will be using the openGL (graphics library) to provide some basic windowing capabilities. As such, the OpenGL development libraries must be installed. Thius can be done via the command line with the following commands. sudo apt-get update sudo apt-get upgrade sudo apt-get install binutils-gold sudo apt-get install libgl1-mesa-dev sudo apt-get install freeglut3 freeglut3-dev It will take a few minutes to install each. You must be connected to the Internet during the installation. After the installation, a re-install of Virtual Box Guest Additions may be required. OpenGL Calls The basic OpenGL library calls are included in the provided main and functions template. In order to set the draw color and plot the point, the following OpenGL functions will be required. call glColor3ub(red, green, blue); // set color call glVertex2d(x, y); // plot point (float) The red, blue, green variables are unsigned bytes. The x and y variables are double floating point values. These calls follow the standard calling convention. Debugging -> Command Line Arguments When debugging a program that uses command line arguments, the command line arguments must be entered after the debugger has been started. The debugger is started normally (ddd ) and once the debugger comes up, the initial breakpoint can be set. Then, when you are ready to run the program, enter the command line arguments. This can be done either from the menu (Propgram -> Run) or on the GDB Console Window (at bottom) by typing run at the (gdb) prompt (bottom window). Example Executions (with errors): Below are some example executions with errors in the command line. The program should provide an appropriate error message (as shown) and terminate. Note, the ed@vm% is the prompt. ed-vm% ./wheels Usage: ./wheels -sp -cl -sz ed-vm% ed-vm% ./wheels -sp 3 Error, invalid or incomplete command line argument. ed-vm% ed-vm% ./wheels -sp 2 -cl 262022046 -sz 1313 extra Error, invalid or incomplete command line argument. ed-vm% ed-vm% ./wheels -s 2 -cl 262022046 -sz 1313 Error, speed specifier incorrect. ed-vm% ed-vm% ./wheels -sp 200 -cl 262022046 -sz 1313 Error, speed value must be between 1 and 101(7). ed-vm% ed-vm% ./wheels -sp 2 cl 262022046 -sz 1313 Error, color specifier incorrect. ed-vm% ed-vm% ./wheels -sp 2 -cl 2620220547 -sz 2313 Error, color value must be between 0 and 262414110(7). ed-vm% ed-vm% ./wheels -sp 2 -cl 262022046 -szz 1313 Error, size specifier incorrect. ed-vm% ed-vm% ./wheels -sp 2 -cl 262022047 -sz 3313 Error, color value must be between 0 and 262414110(7). ed-vm% openGL Errors Based on the specific hardware configuration and/or virtual box configuration, the following warning message may be displayed. OpenGL Warning: Failed to connect to host. Make sure 3D acceleration is enabled for this VM. This warning message can be ignored. Note, some hardware configurations using virtual box may not be able to use openGL. An openGLtest program is provided to verify correction functional of the openGL installation. Assemble and Linking Instructions You will be provided a main function (wheels.cpp) that calls the functions. Your functions should be in a separate file (a10procs.asm). The files will be assembled individually and linked together. When assembling, and linking the files for assignment #10, use the provided makefile to assemble, and link. Note, only the functions file, a10procs.asm, will be submitted. The submitted functions file will be assembled and linked with the provided main. As such, do not alter the provided main. Example Execution: Below is an example execution showing the resulting image. ed-vm% ./wheels -sp 2 -cl 262022046 -sz 1313 The appropriate speed value will vary on different machines. When functioning, the inner circle will rotate inside outer circle. Submission: • All source files must assemble and execute on Ubuntu with yasm. • Submit source files ◦ Submit a copy of the program source file via the on-line submission. ◦ Only the functions file (ast8procs.asm) will be submitted. • Once you submit, the system will score the project and provide feedback. ◦ If you do not get full score, you can (and should) correct and resubmit. ◦ You can re-submit an unlimited number of times before the due date/time (at a maximum rate of 5 submissions per hour). • Late submissions will be accepted for a period of 24 hours after the due date/time for any given assignment. Late submissions will be subject to a ~2% reduction in points per an hour late. If you submit 1 minute – 1 hour late -2%, 1-2 hours late -4%, … , 23-24 hours late -50%. This means after 24 hours late submissions will receive an automatic 0. Program Header Block All source files must include your name, section number, assignment, NSHE number, and program description. The required format is as follows: ; Name: ; NSHE ID: ; Section:
Purpose: Become more familiar with data representation, program control instructions, function handling, stacks, floating point operations, and operating system interaction. Points: 200 Assignment: Write a simple assembly language program using OpenGL to calculate and display the output from a series of provided functions. Use the provided main program which includes the appropriate OpenGL initializations. You will need to add your code (two functions) in the provided functions template. The program should read the draw speed, draw color, and picture size from the command line. The required format for the options is as follows: “-sp -cl -sz ” with a valid septenary number for each argument. For example: ed-vm% ./wheels -sp 2 -cl 262022046 -sz 2313 Note, the ed@vm% is the prompt on my machine. The program must verify the format, read the arguments, and ensure the arguments are valid. The program must ensure that the speed, color, and picture size within the provided ranges. If there are any input errors, the program should display an appropriate error message and terminate. Refer to the sample executions for examples. Submission: All functions must follow the standard calling convention as discussed in class. The functions for the command line arguments and drawing function must be in a separate assembly source file from the provided main program. The provided main program should be linked with the functions. Only the functions file will be submitted. As such, the main file can not be altered in any way. Functions The provided main program calls the following routines: ● Function getParams() to process and check the command line arguments for the picture width and picture height. The function should read each value, convert the ASCII/septenary to integer and verify the range. If all is correct, return the values (via reference) and return to the main with a return value of TRUE. If there are any errors, display the appropriate error message and return to the main with a return value of FALSE. ● Function drawWheels() to plot the below functions. The formulas should be iterated and each will generate an (x,y) value pair each must be plotted with the openGL call. The t value should range between 0.0 and 2 p increment by tStep (predefined) each iteration. x1 = cos(t) y 1 = sin (t) x2 = cos(t) 3 + 2cos(2p s) 3 y 2 = sin(t) 3 + 2 sin(2 p s) 3 x3 = 2cos(2 p s) 3 + t cos( 4 p s) 6 p y 3 = 2sin (2p s) 3 − t sin( 4 p s) 6 p x4 = 2cos(2p s) 3 + t cos( 4 p s + 2p 3 ) 6 p y 4 = 2sin(2 p s) 3 − t sin( 4p s + 2p 3 ) 6 p x5 = 2cos(2 p s) 3 + t cos( 4p s − 2 p 3 ) 6p y 5 = 2sin (2p s) 3 − t sin( 4 p s − 2p 3 ) 6p Before leaving the function, the s value should be incremented by sStep which must be set as follows; sStep = speed scale The template already includes some of the applicable openGL initialization calls (which must not be removed). OpenGL Installation For this assignment, we will be using the openGL (graphics library) to provide some basic windowing capabilities. As such, the OpenGL development libraries must be installed. Thius can be done via the command line with the following commands. sudo apt-get update sudo apt-get upgrade sudo apt-get install binutils-gold sudo apt-get install libgl1-mesa-dev sudo apt-get install freeglut3 freeglut3-dev It will take a few minutes to install each. You must be connected to the Internet during the installation. After the installation, a re-install of Virtual Box Guest Additions may be required. OpenGL Calls The basic OpenGL library calls are included in the provided main and functions template. In order to set the draw color and plot the point, the following OpenGL functions will be required. call glColor3ub(red, green, blue); // set color call glVertex2d(x, y); // plot point (float) The red, blue, green variables are unsigned bytes. The x and y variables are double floating point values. These calls follow the standard calling convention. Debugging -> Command Line Arguments When debugging a program that uses command line arguments, the command line arguments must be entered after the debugger has been started. The debugger is started normally (ddd ) and once the debugger comes up, the initial breakpoint can be set. Then, when you are ready to run the program, enter the command line arguments. This can be done either from the menu (Propgram -> Run) or on the GDB Console Window (at bottom) by typing run at the (gdb) prompt (bottom window). Example Executions (with errors): Below are some example executions with errors in the command line. The program should provide an appropriate error message (as shown) and terminate. Note, the ed@vm% is the prompt. ed-vm% ./wheels Usage: ./wheels -sp -cl -sz ed-vm% ed-vm% ./wheels -sp 3 Error, invalid or incomplete command line argument. ed-vm% ed-vm% ./wheels -sp 2 -cl 262022046 -sz 1313 extra Error, invalid or incomplete command line argument. ed-vm% ed-vm% ./wheels -s 2 -cl 262022046 -sz 1313 Error, speed specifier incorrect. ed-vm% ed-vm% ./wheels -sp 200 -cl 262022046 -sz 1313 Error, speed value must be between 1 and 101(7). ed-vm% ed-vm% ./wheels -sp 2 cl 262022046 -sz 1313 Error, color specifier incorrect. ed-vm% ed-vm% ./wheels -sp 2 -cl 2620220547 -sz 2313 Error, color value must be between 0 and 262414110(7). ed-vm% ed-vm% ./wheels -sp 2 -cl 262022046 -szz 1313 Error, size specifier incorrect. ed-vm% ed-vm% ./wheels -sp 2 -cl 262022047 -sz 3313 Error, color value must be between 0 and 262414110(7). ed-vm% openGL Errors Based on the specific hardware configuration and/or virtual box configuration, the following warning message may be displayed. OpenGL Warning: Failed to connect to host. Make sure 3D acceleration is enabled for this VM. This warning message can be ignored. Note, some hardware configurations using virtual box may not be able to use openGL. An openGLtest program is provided to verify correction functional of the openGL installation. Assemble and Linking Instructions You will be provided a main function (wheels.cpp) that calls the functions. Your functions should be in a separate file (a10procs.asm). The files will be assembled individually and linked together. When assembling, and linking the files for assignment #10, use the provided makefile to assemble, and link. Note, only the functions file, a10procs.asm, will be submitted. The submitted functions file will be assembled and linked with the provided main. As such, do not alter the provided main. Example Execution: Below is an example execution showing the resulting image. ed-vm% ./wheels -sp 2 -cl 262022046 -sz 1313 The appropriate speed value will vary on different machines. When functioning, the inner circle will rotate inside outer circle. Submission: • All source files must assemble and execute on Ubuntu with yasm. • Submit source files ◦ Submit a copy of the program source file via the on-line submission. ◦ Only the functions file (ast8procs.asm) will be submitted. • Once you submit, the system will score the project and provide feedback. ◦ If you do not get full score, you can (and should) correct and resubmit. ◦ You can re-submit an unlimited number of times before the due date/time (at a maximum rate of 5 submissions per hour). • Late submissions will be accepted for a period of 24 hours after the due date/time for any given assignment. Late submissions will be subject to a ~2% reduction in points per an hour late. If you submit 1 minute – 1 hour late -2%, 1-2 hours late -4%, … , 23-24 hours late -50%. This means after 24 hours late submissions will receive an automatic 0. Program Header Block All source files must include your name, section number, assignment, NSHE number, and program description. The required format is as follows: ; Name: ; NSHE ID: ; Section:
1 Introduction and purpose In this project you will write a small simulation of a UNIX utility xargs, described below. The purpose of the project is to get some experience with process control and basic systems programming concepts. The difficulty in this project is in understanding what to do, not in the amount of code (which is actually relatively small) or complexity of data structures. Note that your program in this project does not have to create any pipes. And note that you may not use the C library function system() anywhere in your code, or you will not receive credit for the project. For reasons described below, this project will only have public tests. Due to the size of the course it is not feasible for us to provide project information or help via email/ELMS messages, so such questions will not be answered. You are welcome to ask any questions verbally during the TAs’ office hours. First we explain the basics of the actual UNIX xargs utility. Then we describe what your program has to do, including some examples. Although the first time you read the project this may be somewhat confusing (and even the second time), Section 4 gives a step by step outline of the things your program has to do. 1.1 Extra credit and number of submissions You can again get extra credit for this project. If you make only one submission that passes all the public tests you will get 3 extra–credit bonus points. And if your single submission is made at least 48 hours before the on–time deadline you will get 3 additional extra credit bonus points, for 6 extra credit points total. (Obviously you can’t get the second 3 extra credit points if you don’t get the first 3, but you can get the first 3 without getting these second 3.) However, as before, if you make too many submissions you may again lose credit. To avoid making submissions that don’t compile or don’t pass the public tests you should compile your code, run the tests, check their output, and fix any problems, all before submitting. (If for some reason your code passes all the public tests on Grace, but doesn’t work when you submit it, so you have to submit more than once– and this is not due to an error on your part or a bug in your code– you can talk with me verbally in office hours about receiving the extra credit despite having more than one submission.) 2 About the UNIX xargs utility UNIX utilities are generally small programs that do one thing, but are designed to easily work with other utilities or programs, so they can be used together to do more complex things when needed. Shell pipes are one way to make programs work together, where the standard output of one program or command is used as the standard input of another program or command, whose standard output in turn can be piped to another one, etc. But shell pipes are not helpful with programs or utilities that operate upon command–line arguments rather than standard input, for example if we want one program’s output to be used as another program’s input, but the second program operates upon command–line arguments rather than standard input. In fact, many UNIX commands/utilities do not read any standard input and operate only upon command–line arguments, for example ls, cp, mv, rm, mkdir, etc. One solution that can be used in cases like these is to write a shell script to run the second program using the output of the first program as command–line arguments, but this is more work than needed in many cases. Another solution that is sometimes appropriate is “backquote command substitution”, where one program is backquoted in the arguments of a second program, for example, ls `program1.x`. This causes the output of the backquoted program (program1.x here) to be used as command–line arguments for the first program (ls here). Although this is very useful at times we do not explain it further here. A third solution is the xargs utility, which is the subject of this project. It converts its standard input to command line arguments for another program. xargs has many capabilities, the majority of which we don’t need to worry about. xargs for this project differs from the real xargs UNIX utility in a number of ways, and in fact it is much simpler than the real xargs utility. To differentiate the UNIX xargs utility from the program that you will write to simulate it we will use xargs to refer to the UNIX utility, and your program will be named yargs.c. We will just use yargs to refer to your program, yargs.c to refer to its source file, and yargs.x to refer to its compiled executable. © 2023 L. Herman; all rights reserved 1 Your code will be in a file yargs.c, which will be a standalone program as in Project #2. In order to run the other program that it executes, xargs has to create a child process that will actually run the other program; this is what your yargs simulation of it will also have to do. 3 Usage of yargs.x yargs.x will be invoked as follows: yargs.x -n target–program target–args where all of the command–line arguments are optional, and they are explained next. -n (a minus sign followed immediately by a lowercase ’n’) is an argument to yargs.x itself. If it is present it has the same effect as the -n option to make, meaning that it causes yargs.x to print the commands that it would execute if -n had not been given, without actually executing them. Like the -n option to make, this allows a user to make sure that they are running yargs.x with the arguments and input that will cause it to perform the commands that they want. (Once they have verified that, they will presumably run yargs.x again with the same arguments and input, but without the -n option.) target–program is an argument to yargs.x. If present it is name of the program that yargs.x should execute, using the standard input to yargs.x itself as command–line arguments for target–program. Note that a target program could be a UNIX utility, or a user–created program. target–args is a list of things that will be used by yargs.x as command–line argument(s) for target–program. When running the target program these arguments will precede any command–line arguments that yargs.x takes from its standard input and supplies as additional arguments to the target program. The purpose of the target arguments is for arguments that will be supplied to the target program every time it is run by yargs.x, and the standard input of yargs.x will become different arguments that are given to the target program in addition to (and following) the target arguments. The standard input to yargs.x consists of zero or more lines, i.e., each ending in a newline. For explanation, let input1, input2, ···, inputn denote the successive lines in the standard input of yargs.x (i.e., n is zero if and only if the input to yargs.x is empty). What yargs.x must do is to run target–program target–args input1, then target–program target–args input2, and so on until it runs target–program target–args inputn. In other words, yargs.x will execute the target program once for each line of its own standard input. The first time the target program is run it will have the target arguments as command–line arguments and the first line of the standard input of yargs.x itself as additional command–line arguments. (To do this the input lines will have to be broken up into separate words, discussed further in Section 4.2 below.) The second time the target program is run its command line arguments will be the target arguments followed by the words of the second line of the standard input of yargs.x, and so on for each line of its own standard input. However, if the -n option was given then yargs.x will just print target–program target–args input1, then target– program target–args input2, up to target–program target–args inputn, each on a separate output line, but without actually executing any of them. 3.1 Examples of yargs usage The public tests have examples of running yargs.x in different ways. Here are some examples with explanation. Some of these examples use the UNIX utility /bin/echo as the target program. echo was briefly demonstrated in class when make and makefiles were covered. It just echoes or prints what is on its command line. For example, try a UNIX command something like /bin/echo I want to win chocolate! to see its operation. Suppose the current directory has a file datafile whose contents are the three lines shown on the left below, and a file datafile2 whose contents are the two lines shown on the right. (In each file, each line ends in a newline.) datafile aa bb cc dd ee ff datafile2 program1.c -o program1.x list-test.c list.c -o list-test.x © 2023 L. Herman; all rights reserved 2 yargs.x -n program.x xx yy < datafile In this invocation program.x is the target program name and xx and yy are the target arguments. Because the -n option was given, yargs.x will not attempt to execute program.x (which is not a UNIX command, but it could be a user–written program). yargs.x will just print program.x and the target arguments once for (and followed by) each line of the file datafile (which, due to input redirection, will be the standard input of yargs.x). So yargs.x will simply print these three lines that it would try to execute if -n had not been given: program.x xx yy aa bb program.x xx yy cc dd program.x xx yy ee ff yargs.x /bin/echo < datafile In this invocation the -n option is not used, /bin/echo is the target program name, and there are no target arguments. This will cause yargs.x to actually run the /bin/echo utility for each line of the file datafile. So echo first runs with the two command–line arguments aa and bb (the first line of datafile), then echo runs with its arguments being the second line of datafile, then with its arguments being the final line of datafile. So three lines of output would be produced: aa bb cc dd ee ff The command cat datafile | yargs.x /bin/echo would have the same effect, because the cat command and the pipe also cause the contents of the file datafile to be the standard input of yargs.x. yargs.x mv < datafile In this invocation mv is the target program name and there are no target arguments. Here yargs.x should attempt to run mv three times, once for each line of datafile. The first run of mv will have the two command–line arguments aa and bb, the second run of mv will have arguments cc and dd, and the last run of mv will have arguments ee and ff. The effects of these commands depend on whether bb, dd, and ff are directories, or whether they’re files (or whether they don’t currently exist). The first invocation will run mv aa bb and if bb is a directory it will move aa to the directory bb. But if bb is a file or if bb doesn’t exist, aa will be renamed to bb. Similarly with the commands mv cc dd and mv ee ff which yargs.x will run for the second and third lines of its input. Of course if any of files aa, cc, and ee don’t exist then there will be an error when yargs.x tries to run the command(s) that use the names of nonexistent things. (How yargs.x should handle performing commands that have errors is discussed in the next section.) yargs.x gcc -Wall -ansi < datafile2 In this invocation gcc is the target program name and -Wall and -ansi are the target arguments that will be passed by yargs.x as the first two argument to gcc. The remainder of the arguments each time that gcc will be run will be the words on the lines of yargs.x’s standard input, meaning the contents of each line of datafile2, and will be different during the two runs of gcc. So in this invocation yargs.x should execute gcc twice, once for each line of the file datafile2. As a result, the following commands will be run: gcc -Wall -ansi program1.c -o program1.x gcc -Wall -ansi list-test.c list.c -o list-test.x Note that the real xargs utility does not work the same as these examples, for reasons not fully explained here. Certain options can cause it to work as these examples do, but since you should be concentrating on what your yargs program has to do, we don’t describe the options that would cause xargs to mimic the behavior of your yargs. (If you are interested in learning more about the real xargs utility though you can just run man xargs on Grace. However, to avoid confusing yourself with what xargs would do compared to your yargs, even if you want to look at this information you might want to wait until after you have finished writing the project to do so.) © 2023 L. Herman; all rights reserved 3 4 What your program must do Extracting the files from the project tarfile will create a directory project09. In that directory you must write your program in a file named yargs.c (spelled and capitalized exactly as shown). Here is an outline of what it has to do: 1. It must include two header files safe-fork.h and split.h containing prototypes for functions safe_fork() and split(), which are described respectively later in this section, and in Section 4.2. 2. yargs.x must initially figure out what the arguments were. The first step is to determine whether the -n argument was used on its command line. Note that if the -n option is given, to have an effect it must be the first argument to yargs.x (otherwise it will appear to be the target program name or part of the target arguments). 3. After seeing if the -n argument appeared on the command line, yargs.x must determine the name of the target program, and see if there were any target arguments. For example, if it is run using the invocation yargs.x ls -t, yargs.x will run ls (the target program name here) with target argument -t (followed by arguments taken from lines of its standard input, as described below). • If there were arguments on its command line, and the first one was not -n, the first one is assumed to be the target program name. If there were no arguments after that then there are no target arguments for this execution, otherwise any arguments after the first one are the target arguments for this execution. • If there were arguments on its command line, and the first one was -n, then the second argument must be the target program name. If there were no arguments after that then there are no target arguments for this execution, otherwise any arguments after the -n and the target program name are the target arguments for this execution. • If yargs.x is run without any command–line arguments at all it won’t have anything to do because there is no target program to run, so it will end up just quitting without doing anything. (Its exit status or return value is discussed below.) • If there is only one argument, but it was -n, then again it means that there was no target program, so yargs.x will also quit without doing anything. 4. Once it has examined the command–line arguments, if the -n argument was not used, yargs.x will use its standard input as command–line argument(s) to the target program, running the target program (in child processes) once for every line of its standard input, until the end of the input is seen, with the (whitespace–separated) words of each input line being used arguments as the command–line arguments for the target program. (As above, each line of the input will end in a newline.) Recall that you read until the end of the input in Project #2. Use the function split() described in Section 4.2 to break each input line up into its whitespace–separated words. 5. However, once it has examined the command–line arguments, if -n was given as the first command–line argument to yargs.x (following the name yargs.x itself, of course), then yargs.x must print the commands without executing them. If there are n lines in its standard input then yargs.x will print n lines of output, where the first one consists of the target program name, followed by the target arguments, followed by the words on the first line of the standard input. The second line of output begins with the target program name and target arguments, followed by the words on the second line of the standard input, etc. Regardless of how much whitespace separated the target arguments or the words of the lines of the standard input, yargs.x must print each command with one single space between (separating) the target arguments and the words of the line of standard input. It is strongly suggested that you get the -n option to work right before trying to get your yargs.x program to handle executing commands. If your program is not accessing arguments correctly, or is not reading the lines of standard input correctly, then executing programs based on the target arguments and lines of standard input is certainly never going to work. 6. As mentioned, yargs.x must create a child process each time it runs the target program. This is usually done using the fork() system call, however, do not call fork() directly. We have provided an object file safe-fork.o, which contains a function safe_fork(), which you should call instead. It prevents the possibility of having a serious fork bomb. Your yargs.c should include safe-fork.h so the prototype of safe_fork() there will be visible. safe_fork() has the same return value as fork() and has no parameter, as you can see by looking at safe-fork.h. Use safe_fork(), instead of directly calling fork(). © 2023 L. Herman; all rights reserved 4 We set things up so your program will not even compile on the submit server if it calls fork() directly– but you could still cause a fork bomb on the Grace machines if you call fork() and do it incorrectly, which could cause dozens of other students to lose whatever work they were working on. Do not call fork() directly. 7. yargs.x can run the target program multiple times (once for each line of the standard input), with different arguments each time. The target program could produce different results each time it is run, because it’s being run with different arguments each time. One invocation of the target program might even use results that were produced by an earlier invocation of the target program. Consequently your yargs.x must ensure that each time the target program is executed it finishes running before the next time that yargs.x runs the target program for the next line of its standard input. 8. After each time it runs the target program in a child process, yargs.x will have to determine the exit status of the child process, to know what status it should itself exit with. It will determine its exit status as follows: • If the target program exits successfully every time that it is run (for every line of the standard input of yargs.x), then yargs.x itself must also quit (after running the target program once for each line of its own standard input) with exit status 0. • If any invocation of the target program exits with any nonzero exit code, your yargs.x must stop running the target program (meaning not run it for any further lines of its standard input), and quit immediately with whatever nonzero exit status the target program had. • If any error ever occurs trying to run the target program, meaning it could not even be run, your program must quit (without trying to run the target program for further lines of its standard input) with exit status 255. For example, suppose your program is run as yargs.x bananas are fun. Although perhaps there should be, there is not actually a UNIX command bananas, so, assuming that there is no program in the current directory or on the UNIX search path (see Section 4.1) that happens to be named bananas, when your program tries to create a child process running bananas it will fail, and yargs.x must exit with status 255. 9. Another case where yargs.x should quit without doing anything is if its standard input is empty. Regardless of what command–line were given to it, yargs.x is supposed to do something for every line of its standard input. If it has no standard input then it will not have anything to do, so it will end up just quitting, with an exit status of 0. 4.1 Notes • yargs.x will not use pipes. Any output produced by the child process created to run the target program will just be printed to your program’s standard output. • The output produced when the -n option is used should go to the program’s standard output. Your program has to read its standard input, and produce output when the -n option is used. Do not use low–level UNIX I/O system calls (that were recently covered) to read input or produce output– this is more difficult, and unnecessary in this project. Just use the C standard I/O library functions covered in Chapter 15 of the Reek text. • Before starting to code, look at the way that yargs.x is run in all of the public test scripts, and look at the files that will be used as standard input for each test. Then look at the expected output files. If you aren’t sure why the arguments and input result in the contents of the expected output files then ask about it in the TAs’ office hours before starting to code, so you are sure you understand what yargs has to do. • Several special cases were mentioned above in which yargs.x should not do anything and just quit (with an exit status of 0). These may not have to be handled as special cases; much of this behavior could very well occur naturally in these situation when you write the program to handle normal cases, so do not create special cases for these situations unless they are necessary. • Recall that to see the exit status (or exit code) of a program run in the tcsh shell on Grace, just use the command echo $status. (It has to be the very next command, because it prints the exit status of the most recent command.) • The -n option is not passed from yargs.x to the target program. (It is not a target argument.) It is an argument to yargs.x, which affects the operation of yargs.x itself. • Your program may assume that no input line will ever have more than 10000 characters before its terminating newline. Other than what’s imposed by the maximum length of a line there is no specific limit to the number of © 2023 L. Herman; all rights reserved 5 arguments that may be on a line, or to the number of input lines that your program may have to read and use as command–line arguments to the target program that it is running. Situations where it’s not known in advance how much data has to be read are cases where dynamic memory allocation is needed. • Notice that when your yargs.x runs the target program, this will need to be done using some form of exec (one of the exec system calls). Your program should use the UNIX search path to find the target program to run. Look at the forms of exec mentioned in class and figure out which one or ones can be used in these situations. Recall that the name of a program to run (the target program here) is given twice in a call to exec! It has to appear as the first and second arguments to an exec–family system call. • When a program is run using input redirection, such as program.x arg1 arg2 arg3 < inputfile, the program does not see the “< inputfile” in its arguments. In this example the program would just see three command–line arguments arg1, arg2, and arg3. This is because when input redirection is used the shell creates a process to run the program in, that process’s input is redirected to come from the input file, and the program is run in that process. The program has no way to know or tell where its standard input is coming from, whether from the keyboard or in this case from the file inputfile– it just reads input. So yargs.c does not need to look for input redirection symbols or input files on its command line, it just needs to read its input, no matter where it is coming from. Similarly, when input is redirected to come from a pipe (as in program1.x | program2.x arg1 arg2 arg3), the pipe symbol and whatever is before it don’t appear in the program’s arguments, so “program1.x |” will not be in the arguments for program2.x here. program2.x will just see the output of program1.x as its standard input when it reads input. • yargs.c will have to free any dynamically–allocated memory when it is no longer needed or it will fail some tests. 4.2 Our function char **split(const char line[]) To facilitate the process of reading input lines and breaking them up into words we are supplying you with a function split(), with prototype above, in the object file split.o (with associated header file split.h). It will take a string and break it up into its components, where each component (except as indicated below) is a word, separated from other words by either whitespace, or by the beginning or end of the string. split() returns a dynamically–allocated array of dynamically–allocated strings, each of which is one word of the line. The array will end with a NULL pointer, so its last element can be detected. split() will ignore (discard) blank spaces, tabs, and newlines before and between words. Its argument string optionally can end with a newline. (Since you will be reading lines from the input and calling split() on them, by default they would end in newlines.) For example, if called on an input line parrots eat carrots (which ends with a null character), split() will return a dynamically–allocated array with four elements, which will be (in order) the strings parrots, eat, carrots, and the fourth element of which is just NULL (the three non–NULL strings, and the array itself, are all dynamically allocated). Note the result would be the same if more than one blank space or tab separated the words. The UNIX shell has many characters that have special meaning, for example backslashes can be used to escape certain special characters, and single quotes have somewhat different effects than double quotes. For simplicity, our split() function does not attempt to emulate these behaviors, other than one, which is that a double–quoted string is treated as a single argument, and whitespace is preserved inside double–quoted strings. For example, if your program is run as yargs.x⊔test.x⊔one⊔”two⊔⊔three⊔ ⊔ ⊔four”⊔five the target program test.x must be run with three command– line arguments, the second of which will be “two⊔⊔three⊔⊔⊔four” (containing spaces as shown), and split() would treat it as a single argument. The array returned by split() is dynamically allocated, and all the strings that the array points to are also dynamically allocated. To avoid memory leaks you will have to free this memory when it is no longer needed. Note that the array argv of command–line arguments to your program is of the same form as the array returned by split(), but the argv array should not be explicitly freed– it is created automatically before a program starts, and automatically freed if need be when the program ends. If a program tries to explicitly free its command–line arguments it will most likely result in a fatal error. Do not try to write your own split() function. We are giving you a correct split() function. If you write your own it might not always work correctly, or might not be consistent with ours in all cases. Just call our split() function to break up lines of the standard input into their constituent words. © 2023 L. Herman; all rights reserved 6 A Development procedure review A.1 Obtaining the project files and compiling Log into the Grace machines and use commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project09/project09.tgz This will create a directory project09 that contains the necessary files for the project, including the public tests and associated data files, safe-fork.h, safe-fork.o, split.h, and split.o. After extracting the files from the tarfile, cd to the project09 directory, create a file named yargs.c (spelled exactly that way) that will #include the header files safe-fork.h and split.h, and write the program as described above. Since your program will be a self–contained standalone program, you don’t need to write a makefile for this project (although you are welcome to if you like; if so it will just be ignored on the submit server). You can compile your program just using the (exact) command: gcc yargs.c safe-fork.o split.o -o yargs.x (Of course if you do write a makefile you need to ensure that the required compilation options are being used.) A.2 Checking your results and submitting The tests of this project will all be UNIX shell scripts, named public01, public02, etc. These public test scripts have no standard input. They will run your program with different arguments, and although the scripts do not read any input, they may run your yargs.x program with input redirected from a file or piped from a command. The public test scripts produce output that can be compared with the expected output, for example: public09 | diff – public09.output The test script (public09 in this example) is running your yargs.x with a target program, a target argument, and some input; you can see the exact commands that each test is using to run yargs.x by looking at the scripts using less. These scripts are expecting the name of your executable program to be yargs.x, so you must use that exact name when compiling. In a recent discussion section your TA showed you the run-tests shell script, which runs a program on all of its tests at once. However, the run-tests script will not work correctly for this project, because run-tests is expecting the project tests to be compiled C programs (executable files) rather than shell scripts. A modified version of run-tests named run-tests2 has been put on Grace that you can run instead for this project– after you compile yargs.x, just running run-tests2 will execute all of the shell scripts (tests) and indicate whether they all passed or any failed. We can’t use our memory checking functions to test your program for memory leaks or problems in the heap, because you’re writing a self–contained program with a main() function. However, some of the tests will use valgrind with the right arguments to check your program for problems of this type. Since our memory checking functions are incompatible with valgrind, do not use our memory checking functions in your program, or it will fail these tests. A.3 Grading criteria Your grade for this project will be based on: public tests 85 points programming style 15 points As you can see, and as mentioned above, there are only public tests for this project. For you to come up with your own tests of your program you might have to think about what UNIX commands to run with what arguments, and you would also need to write your own shell scripts. But while understanding basic UNIX shell script concepts is useful, the course doesn’t really expect you to have to learn how to write shell scripts. To avoid this, all of the tests for this project will just be public tests, and those and style will make up your entire project grade. © 2023 L. Herman; all rights reserved 7 B Project–specific requirements, suggestions, and other notes • Before starting to code, make sure you understand the lecture examples of process control. If you have questions about them, ask about them in the TAs’ office hours in advance. • Of course the program could be written without explicitly using process control (e.g., by using the C library function system()). But since the entire point of the project is to write a program using process control, you will not get any credit for the project if you use system() at all anywhere. You must write the project as described. Also do not use a loop to try to make a process wait for another one to do something. (This is called busy–waiting and it is very inefficient, compared to correct ways to accomplish this.) • If any system call fails your program should print some sort of descriptive message saying what didn’t work right and where. It doesn’t matter what happens after that. The exact wording of the message doesn’t matter and it doesn’t matter what your code does if this occurs, as long as an explanatory message is printed. • Any function that allocates memory should test whether the memory allocation was successful, and handle the situation appropriately if it was not successful. (The appropriate action might just be gracefully exiting the program after printing an error message, but it at least should not be just crashing). • As mentioned above, do not use our memory checking functions for this project. Besides being incompatible with some of the tests that use valgrind, when your code is compiled on the submit server it will not be linked with memory-functions.o, so your program will not even compile on the submit server if you try to use our memory functions. • If you make changes to yargs.c, don’t forget to recompile it to create a new yargs.x before rerunning the public test shell scripts! • You cannot run a shell script using gdb (or gede), or valgrind. If you want to debug, run yargs.x under gdb (compiling of course with -g), then look at the shell script for the test that you want to run the program for, to see what command–line arguments the public test script is running yargs.x with, and what it is using for its input. Then run your yargs.x under gdb with the same arguments and input. And similarly, if you want to run valgrind, look at the shell script that is the test you want to run it on, and run it on the command given in the script. You can run a program under gdb with command–line arguments and input redirection. For example (suppose this is after running gdb yargs.x and setting some breakpoints), say we want to run yargs.x with the argument -gcc -Wall, reading input from the file datafile2. The following command to your running gdb will do this: run gcc -Wall < datafile2 You can also run valgrind on a program with command–line arguments and using input redirection: valgrind yargs.x gcc -Wall < datafile2 Some tests will run yargs.x with its standard input coming from a pipe, for instance, an invocation like the example ls *.c | yargs.x touch. If you want to debug your program in a case like this we recommend redirecting the output of the commands or commands before yargs.x into a file, then running gdb on yargs.x with input redirected from that file. Just be sure to use a backslash to disable aliases for any commands that are being used to generate input for yargs.x (the way the public tests are written takes care of this). In this example, run ls *.c > tmpfile, then run gdb yargs.x, and (as mentioned above) at the gdb prompt run < tmpfile. Comments in the public tests will explain what commands to use if you want to run the debugger on your yargs.x for that test. • If you are debugging a program that uses process control to create child processes, note that gdb will by default continue to trace the execution of the parent process after a fork. However, if you use the gdb command set follow-fork-mode child before the program creates a child, gdb will trace the child instead. (If needed, set follow-fork-mode parent will switch back to tracing the parent when subsequent child processes are created.) If your yargs.x program ends up running child processes with the wrong command–line arguments nothing is going to work right, and it might be difficult to know why. So if things aren’t working right and you’re not sure why, set a breakpoint in gdb right before a child process is created, use follow-fork-mode child, set a breakpoint in the child process right before it runs another program, and use gdb to view the arguments at that point that the other program is about to be run with. If they’re not right, that is the problem. © 2023 L. Herman; all rights reserved 8 • Do not write code using loops (or recursion) that has the same effect as any string library functions. If you need to perform an operation on strings and there is a string library function already written that accomplishes that task, you are expected to use it, otherwise you will lose credit. • You will lose credit if you cast the return value of the memory allocation functions. Besides being completely unnecessary, in some cases this can mask certain errors in code. • You cannot modify anything in the header files safe-fork.h or split.h or add anything to them, because your code will be compiled on the submit server using our version of these files. Your code may not comprise any source (.c) files other than yargs.c, so all your code must be in that file. You cannot write any header files of your own either. • For this project you will lose one point from your final project score for every submission that you make in excess of five submissions, for any reason. • Make sure none of your program lines have length more than 80 by running your Project #2 line length check programs on yargs.c! • Be sure to make frequent backups of your yargs.c source file in a different directory in your course disk space. • Recall that all your projects must work on at least half of the public tests (by the end of the semester) in order for you to be eligible to pass the course. See the project policies handout for full details. • Recent projects said that if you have a problem with your code and have to come to the TAs’ office hours for debugging help you must come with tests you have written yourself that illustrate the problem, not the public tests. However, since you aren’t expected to have to write shell scripts in this project, the TAs will help find bugs with the public tests in this project, even if you have not written your own tests. C Academic integrity Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, publicly providing others access to your project code online, or unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to projects. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus– please review it now. The academic integrity requirements also apply to any test data for projects, which must be your own original work. Exchanging test data or working together to write test cases is also prohibited. © 2023 L. Herman; all rights reserved 9
1 Introduction and purpose In this project you will extend upon your graph implementation from Project #7 in two ways: new functions will be added that allow removing components of graphs as well as an entire graph; and the graph functions will have to free any allocated memory when it is no longer needed. Some tests in this project will check for memory leaks and memory errors resulting in heap corruption (making parts of the heap invalid); Section 4 explains how these tests do this. Your functions in Project #7 could have passed some (or even all) of their tests despite having errors of these types, but in this project you will have to fix this. The purpose of the project is to ensure that you are not only able to create a more complex dynamically allocated data structure, but also able to correctly free memory for such a data structure. You must again have a makefile (still named Makefile) in your submission that will compile your code for all of the public tests. Other than two points mentioned in Appendix B below, the requirements for your makefile are the same as in the previous project, so they are not repeated here. See Project #7 and the notes in Appendix B about makefile dependencies. Note that Project #7 explained in detail how to test your makefile before submitting. Due to the size of the course it is not possible for us to provide project information or help via email/ELMS messages, so we will be unable to answer such questions. But you are welcome to ask them verbally during the TAs’ office hours. 1.1 Fixing problems in your Project #7 code Even if you had trouble with Project #7 and may not get a high score on it, you can still do fine in the course (one project just isn’t worth much toward your grade). But you do have to get Project #7 to work though, so you can reuse it and expand upon it in this project. So if you know that you have bugs in your Project #7 code, fix them right away. (Hint– valgrind is your friend; see the rest of this section.) A few days after this project is assigned we will provide the Project #7 secret tests in ~/216public/project07. You can use them to figure out changes needed in your Project#7 code when reusing it in this project. But even if you aren’t aware of any bugs in your Project #7 code your functions might still, as mentioned above, have some errors in the heap, despite being able to pass the Project #7 tests. But in this case your code will fail many of the tests in this project (the ones that are checking for consistency and correctness of the heap). So it is strongly recommended, before even trying to write the new functions in this project, that you first use valgrind to test for– and fix– any heap problems in your Project #7 code as follows: • Before even extracting the files from the project08.tgz tarfile, make a complete second copy of your entire project07 directory, with a different name. (This is just so you can make changes to your previous code while still keeping a backup of your Project #7 code as it was submitted.) • In the new copy of your project07 directory, add the -g compilation option to your makefile (if it’s not already in your compilation options), run make clean, then rebuild all of the public tests. • Now run valgrind –leak-check=no public01.x and look for any errors reported by valgrind. As the Project #7 assignment mentioned, the Project #7 tests will unavoidably have memory leaks, because there are no Project #7 functions that remove any components of a graph, so the tests have no way to free the memory used by String_graph variables when they finish. But the –leak-check=no option to valgrind prevents it from checking for memory leaks, so it will only check for other types of problems in the heap. valgrind was explained in discussion section and a number of small examples were shown (the tarfile with the examples is in ~/216public on Grace). The examples had different kinds of memory problems that could occur, and allowed you to see how valgrind identified the different kinds of problems. • Fix any memory problems in your string-graph.c that valgrind identified. If you need help understanding what valgrind is saying, after carefully reviewing and rerunning the valgrind examples in ~/216public, you can ask the TAs in office hours to help you understand what valgrind is telling you. • Repeat the same command for the rest of the public tests, fixing any problems reported for any of the later tests as well. © 2023 L. Herman; all rights reserved 1 • Then you can use your fixed string-graph.c in writing this project. Of course this procedure assumes that you were in discussion section when valgrind was demonstrated and explained by the TAs (on Monday, March 27), so you understand it. If you weren’t in discussion then, you’re on your own for figuring valgrind out. Try to ask a friend who was in discussion that day to explain it to you and show you the examples. 1.2 Extra credit and number of submissions You can again get extra credit for this project. If you make only one submission that passes all the public tests you will get 3 extra–credit bonus points. And if your single submission is made at least 48 hours before the on–time deadline you will get 3 additional extra credit bonus points, for 6 extra credit points total. (Obviously you can’t get the second 3 extra credit points if you don’t get the first 3, but you can get the first 3 without getting these second 3.) But as before, if you make too many submissions you may again lose credit. To avoid making submissions that don’t compile or don’t pass all the public tests you should compile the tests, run them, check your output, and fix any problems, all before submitting. And you should write your makefile first, and test it before submitting, as discussed below. (If for some reason your code passes all the public tests on Grace but doesn’t work when you submit it, so you have to submit more than once– and this is not due to an error on your part, a bug in your code, or a problem with your makefile– you can talk with me in my office hours about receiving extra credit despite having more than one submission.) 2 New functions to be written Just copy your string-graph.c and string-graph-datastructure.h files from your project07 directory (or copy the corrected versions from the new copy of your project07 directory if you followed the procedure in Section 1.1 above) to the new project08 directory that will be created by extracting the files from this project’s tarfile. As mentioned above, make whatever changes are needed to fix any bugs in your Project #7 code that was copied to the project08 directory, and add the new functions described below to it. But before starting to code, look at the public tests so you see how the functions are called. Write one function at a time (entirely or even partially), then test it (as soon as that is possible) thoroughly before going on! Write and test helper functions before writing code that uses them, and test them separately first as well. Project #7 said that if a pointer passed into a parameter of any function is NULL the function should have no effect, and (with two exceptions) if it returns an integer value it should return 0. This also applies (with one exception to the return value) to the new functions being added in this project 2.1 void free_vertex_name_list(char **const names) This is a helper or utility function that is not actually part of a graph. It should free all the dynamically–allocated memory of the strings in the array names, then free the array names itself. It should have no effect if the pointer names itself is NULL. If names is non–NULL the function may assume that its last element will always be NULL (i.e., the array will be like those returned by the function get_vertex_list(), and by the new function get_neighbor_names() explained below). So the user of your code can use this function to free the memory returned by get_vertex_list() and get_neighbor_names() when they are no longer needed, to avoid memory leaks. 2.2 char **get_neighbor_names(String_graph *const graph, const char vertex[]) If its parameter graph itself is NULL, or if the graph does not have a vertex with the name vertex, this function should just return NULL. Otherwise it should return a newly–created dynamically–allocated array, say new_arr, of pointers to characters, where the pointers point to copies of the names of the neighbors of the vertex vertex in the parameter graph. (Recall that a neighbor of the parameter vertex is a vertex, say other, such that there is an edge from the parameter vertex to other.) The names pointed to by the pointers in new_arr must be in increasing lexicographic (dictionary) order. Each name must be stored in a newly–created dynamically–allocated character array; that is, the pointers in new_arr should not just be aliases of the names stored somewhere in your graph data structure. © 2023 L. Herman; all rights reserved 2 If vertex has n neighbors, the array returned by the function must be an array with n+1 elements, where the last element must be NULL. (So if the vertex vertex has no neighbors the function must return a dynamically allocated array of one element, which will be NULL.) (The return value of this function for NULL parameters is the exception referred to in the third paragraph in Section 2.) 2.3 int remove_graph_edge(String_graph *const graph, const char source[], const char dest[]) This function should remove the edge going from source to dest in its parameter graph. If either source or dest are not vertices in the graph, or if they are vertices but there is no edge from source to dest, the function should just return 0 without changing anything, otherwise it should remove the edge and return 1. Any dynamically–allocated memory used by the edge must be freed. Recall that your data structure must contain at least one linked list or binary search tree somewhere. If, given your data structure, it’s necessary to remove elements from linked lists or binary search trees in order to remove graph edges, how to do these things was covered in CMSC 132. 2.4 int remove_vertex_from_graph(String_graph *const graph, const char vertex[]) If vertex is not a vertex in its parameter graph this function should just return 0 without changing anything, otherwise it should remove vertex from the graph and return 1. If a vertex is removed then all its outgoing as well as incoming edges must also be removed, because an edge needs both its source and destination vertices to exist. Any dynamically– allocated memory used by the vertex and its associated edges must be freed. 2.5 void graph_delete(String_graph *const graph) This function should deallocate all dynamically–allocated memory that is used in the String_graph variable that its parameter graph points to, releasing the graph and all its data in the process. So when this function returns, the graph data structure that graph previously pointed to effectively no longer exists in memory. See Section 3 for explanation about when this function can– and must– be called. 3 Valid sequences of calls to the functions A sequence of calls to the graph functions (including the ones from Project #7 as well as the new ones added in this project) on a given String_graph variable is valid if and only if the sequence satisfies the following: • graph_init() is the first call in the sequence. • Unless it is the last call in the sequence of calls to the graph functions, a call to graph_delete() must be followed immediately by a call to graph_init(). So following a graph_init() call and until the next graph_delete() call (if any), there can be any sequence of calls of functions other than graph_init() and graph_delete(). And it is invalid to call any other functions on a String_graph variable after graph_delete() is called on it, until graph_init() is called on it. The effect of an invalid sequence of calls is undefined. Ensuring that the sequence of calls of your function is valid is the responsibility of the caller of your functions– which includes your own tests– your code does not have to try to detect violations of these properties, and in fact it has no way to do so. If the user of your functions wants to avoid memory leaks they must make only valid sequences of calls to the graph functions, and they must also ensure that the last function call they make in any sequence of calls to the functions on a String_graph variable must be graph_delete(). They must also call free_vertex_name_list() exactly once on every pointer returned by the two functions get_vertex_list() and get_neighbor_names(), and after that they must not use the pointers again. (Note that a String_graph variable will still be valid even if this is not done.) 4 Memory checking and memory errors We have included an object file memory-functions.o in the project tarfile, which has two functions named setup_memory_checking() and check_heap() that are similar to the functions from Project #6. The header file © 2023 L. Herman; all rights reserved 3 memory-functions.h contains their prototypes. They have no parameters or return value. We use them as follows in some of our tests to detect memory leaks and corruption: • setup_memory_checking() must be called once in a program before any memory is dynamically allocated; it sets up things to check the consistency and correctness of the heap. If this function has been called initially, and the program has any memory errors later (such as overwriting beyond the end of an allocated memory region in the heap) the program will crash (consequently failing a test). • If there is any memory in use in the heap at all when check_heap() is called it prints a message and causes the program to immediately quit. If there is no memory in use in the heap at all then calling it just has no effect. Some tests will initially call setup_memory_checking(), then call check_heap() after functions are called on the different data structures and their memory is cleared. If the functions that are supposed to remove all or part of your graph data structure are not freeing memory properly, your program will report that there is some allocated memory in the heap– meaning a memory leak occurred- causing these tests to fail. (This could also happen if the user calls your functions in an invalid order, for example by calling graph_init() on a nonempty String_graph variable without calling graph_delete() on it first, but this is their fault. If you are using our memory checking functions in your own tests and you make an invalid sequence of calls to these functions, such tests would fail.) The facilities we use to perform this memory checking are not compatible with memcheck (valgrind), because memcheck uses its own versions of the memory management functions, that themselves use some memory in the heap. Consequently, if you run any tests that use our memory checking functions under valgrind, the test will appear to fail with a memory leak, even if your code does not have memory leaks. You can use either our memory checking functions with a program, or valgrind with it, but not both at the same time, or they will give incorrect results. Appendix B describes how to easily use valgrind in this project. The functions discussed in this section are for tests of your code to use. Do not try to call them from your graph functions– this won’t work. And since your code in string-graph.c will not be calling these functions, your string-graph.c should not include memory-functions.h. (Any of your own tests that want to check the correctness of the heap will include memory-functions.h, and will have to be linked with memory-functions.o.) A Development procedure review A.1 Obtaining the project files, compiling, checking your results, and submitting Log into the Grace machines and use commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project08/project08.tgz This will create a directory project08 that contains the necessary files for the project, including the header files string-graph.h, compare-vertex-lists.h, and memory-functions.h, the object files compare-vertex-lists.o and memory-functions.o, and the public tests. As was discussed above, copy both of your string-graph.c and string-graph-datastructure.h files from your project07 directory (or copy the corrected versions from the new copy of your project07 directory if you followed the procedure in Section 1.1 above) to the project08 directory after extracting it from the project tarfile, so you can add the new functions in it. As in Project #7 you will not be compiling your code from the command line; you will use make and your makefile to compile everything. diff can be used as before to check whether your code passes a public test or not, for example: make public01.x public01.x | diff – public01.output (You can also run make from Emacs if desired.) Note that your makefile cannot run each test when building it, because this will probably not work on the submit server. If you want to create a separate target in your makefile that runs the tests this will not cause a problem on the submit server (although a better way to do this will be explained in discussion section soon). Running submit from the project directory will submit your project, but you first must make sure your makefile works and you have passed all the public tests, by compiling them– using your makefile– and running them yourself. Unless you have versions of all required functions that will at least compile, your program will fail to compile at all on the submit server. (Suggestion– create skeleton versions of all functions when starting to code, that just have an appropriate return statement.) © 2023 L. Herman; all rights reserved 4 A.2 Grading criteria Your grade for this project will be based on: public tests 40 points secret tests 45 points programming style 15 points Some secret tests may test your makefile. Section 4 of Project #7 explains the requirements for your makefile, so review that to ensure that you are satisfying them, in case they are tested by secret tests in this project. B Project–specific requirements, suggestions, and other notes • The data structure requirements are the same in this project as in Project #7. Be sure you have followed them– reread them in the Project #7 assignment. • Note that different public tests in this project use different things and include different header files, so different tests will have different dependencies in your makefile, and different files will have to be linked to create the executables for different tests. We emphasize again that you should not wait to write your makefile. Write it first– before you even start writing the graph functions. This will avoid having to type commands for compiling the public tests, but more importantly, using your makefile all during development, every time you compile your code, will allow you to ensure it works correctly before you submit. You can copy your makefile from your project07 directory to the directory for this project as a starting point, but the tests in this project will have different dependencies than they did in Project #7, and different files will have to be linked to create their executables than in Project #7. • Now that the preprocessor has been covered, and the conditional compilation directives that are actually always used in C header files have been explained in class, you must now have correct conditional compilation directives in your header file string-graph-datastructure.h. (Note that part of using conditional compilation correctly is that you must spell the name of the preprocessor symbol used for conditional compilation either as described in class and in the lecture slides, or in the form that the Reek text uses, or you may lose some credit.) • Since one header file in this project includes another one, your makefile must now use one of the approaches recently discussed in class for handling dependencies in this situation, otherwise it may not perform needed recompilations. (If your makefile doesn’t use one of these approaches you might lose a few points due to failing a secret test.) • As mentioned earlier, you cannot use valgrind together with our memory checking functions. We want to make it easy for you to use valgrind if you have memory problems that arise while writing this project, so we wrote the tests to allow you to easily turn off our memory checking, by just compiling with the preprocessor symbol ENABLE_VALGRIND defined. To be able to use valgrind (this assumes your Makefile is correct): – Add a definition of ENABLE_VALGRIND (e.g., -D ENABLE_VALGRIND) to the compiler options in your Makefile. (Recall you also have to add the -g option to be able to use valgrind, as well as gdb.) – Force all the tests to be recompiled using make clean (or touching all source/header files would also work), and recompile everything (our tests and yours). – Now you can use valgrind on any test. When you’ve found the problem and are done using valgrind, don’t forget to remove the definition of ENABLE_VALGRIND from the compilation options in your makefile and recompile everything again, otherwise you will not be running the same versions of the tests as the submit server is, because it will be using our memory checking for some tests. After re–enabling our memory checking by removing -D ENABLE_VALGRIND and recompiling, rerun the tests to confirm that they pass. © 2023 L. Herman; all rights reserved 5 – When running valgrind, add the option –leak-check=no to turn off checking for memory leaks for any test that is not calling graph_delete() on any String_graph variables when they are no longer needed and is also not calling free_vertex_name_list() on any pointers returned by get_vertex_list() and and get_neighbor_names() when they are no longer needed. • Do not write code using loops (or recursion) that has the same effect as any string library functions. If you need to perform an operation on strings and there is a string library function already written that accomplishes that task, you are expected to use it, otherwise you will lose credit. • You will lose credit if you cast the return value of the memory allocation functions. Besides being completely unnecessary, in some cases this can mask certain errors in code. • For this project you will lose one point from your final project score for every submission that you make in excess of four submissions, for any reason. • You cannot modify anything in the header files string-graph.h, compare-vertex-lists.h, or memory-functions.h, or add anything to them, because your submission will be compiled on the submit server using our versions of these files. You cannot write any new header files of your own other than string-graph-datastructure.h. Your code may not comprise any source (.c) files other than string-graph.c, so all your code must be in that file. • Any helper functions that you write in string-graph.c, whose prototypes are not already in string-graph.h, will be “private”, so they should be declared static, and their prototypes should be placed at the top of string-graph.c, not in string-graph.h or string-graph-datastructure.h. • If you have a problem with your code and have to come to the TAs’ office hours for debugging help you must come with tests you have written yourself that illustrate the problem (not the public tests), what cases it occurs in, and what cases it doesn’t occur in. In particular you will need to show the smallest test you were able to write that illustrates the problem, so whatever the cause is can be narrowed down as much as possible before the TAs even start helping you. You must also have used the gdb debugger, explained in discussion section, and be prepared to show the TAs how you attempted to debug your program using it and what results you got. And besides having used gdb, you must have also run memcheck (valgrind) if your code has any bugs, and be able to understand the results it shows. • Recall that the course project policies handout on ELMS says that all your projects must work on at least half of the public tests (by the end of the semester) in order for you to be eligible to get a C− or better in the course. C Academic integrity Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, publicly providing others access to your project code online, or unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to projects. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus – please review it now. The academic integrity requirements also apply to any test data for projects, which must be your own original work. Exchanging test data or working together to write test cases is also prohibited. © 2023 L. Herman; all rights reserved 6
1 Introduction and purpose In this project you will write a concurrent (multithreaded) program to perform a simple task. The purpose of the project is to get some experience with concurrency using Pthreads. The different lecture examples illustrating Pthreads show all the features that you will need to write this project. You just have to figure out how to put the pieces together, so the first step will be to carefully study those lecture examples. This is another smaller project, so there is a shorter time for it to be done in. As has been emphasized many times earlier, the instructional staff cannot answer questions about projects via messages/email. Important: after the semester is over, be sure to clean up your TerpConnect account, as described in Appendix D. 2 Problem to be solved The project tarfile contains a program wc.c, which implements the basic functionality of the UNIX word count utility wc. wc counts and prints the number of lines, words, and characters in its standard input, or in a file whose name appears on its command line. If multiple filenames appear on its command line (or UNIX wildcard characters that describe multiple files), wc will print the number of lines, words, and characters in each file individually, followed by the total number of lines, words, and characters in all of the files combined. For example, if you extract the files from the Project #11 tarfile, cd to the project11 subdirectory, and run wc *, you will get the following output: 14 82 458 public1 12 113 707 public1.inputdata 1 3 15 public1.output 13 80 448 public2 1 3 15 public2.output 13 82 454 public3 0 0 0 public3.inputdata 1 3 15 public3.output 12 59 342 public4 94 1017 5475 public4.inputdata1 109 1141 5959 public4.inputdata2 1 3 16 public4.output 13 71 404 public5 285 33117 186808 public5.inputdata1 252 35535 199498 public5.inputdata2 1 3 18 public5.output 26 166 961 public6 169 8025 56735 public6.inputdata 1 3 16 public6.output 26 139 845 public7 2 10 73 public7.output 73 376 2442 wc.c 1119 80031 461704 total As mentioned, the three numbers on each line are the number of lines, number of words, and number of characters for the file on that line, and the last line is the total number of lines, words, and characters for all files combined. If all we care about is the last line, containing total counts for all files aggregated, we could use concurrent threads to speed up the computation, with a different thread reading and counting the lines, words, and characters of each file. Since I/O is very slow compared to the speed of the CPU and memory, being able to read and process multiple files concurrently could save time. Our wc.c program produces this output, meaning the final three numbers, for all of the files on its command line. If you compile it and run it on the files in the project11 directory it will print a single output line reading 1119 80031 461704, which is the last line of the results above. (If you compile it so its executable is in the project11 directory © 2023 L. Herman; all rights reserved 1 then the size of the executable will change the output, so I compiled it as gcc wc.c -o ../wc.x to put the executable in the parent directory, then I ran it as ../wc.x *, and got the single line with the three numbers above as output.) As far as the results it produces are concerned, our program wc.c is wonderful, does exactly what we want, and works perfectly. The only problem is that we forgot that we had intended it to be multithreaded, as mentioned above. Your task is to rectify this. In particular, you are copy wc.c to another file and modify it so that it creates one thread per filename appearing on its command line, so the number of threads created will equal the number of arguments appearing on the command line that follow the program name itself. Each thread needs to be passed the name of a file to read and count the lines, words, and characters of. Each thread should open the file with the name that was passed to it, read the contents of its file to count these three things, and return the results, meaning the total number of lines, words, and characters in that thread’s file. (If the file does not even exist the thread should return 0 for all three counts.) The code that is currently in the main() function of wc.c, for counting lines, words, and characters of a single file, must be turned into a function that is invoked by each thread. main() will have to create the threads, access their return values, and sum the results returned by the threads to compute the three total numbers. When all the threads have finished, main() must print the total lines, words, and characters of all the files whose names appeared on the command line, which will be the sums of the counts returned by all the threads. For any invalid filenames, i.e., command–line arguments that are not the names of existing files, our original wc.c program just uses a count of zero lines, zero words, and zero characters. Your multithreaded version should do the same. The output of your multithreaded program will be exactly the same as the original single–threaded version of wc.c given to you. The only difference is that it will use multiple threads, one for each file argument. Your modified program must use multiple threads, one for each file argument. You could submit a copy of the original (single–threaded) program that is given to you, and it would pass all the tests. But we will be checking during grading whether your program uses multiple threads. If it does not you will get zero points for the entire project, (consequently, due to the minimum requirements policy for projects, you would not pass the course unless you were to later submit a version by the end of the semester that passes at least half of the public tests and does use threads. Furthermore, just submitting our program as your work for this project could be considered an academic integrity violation, so we would have to involve the Office of Student Conduct to determine that.) Note that besides only printing the total counts for multiple files (but not the counts for each file separately first) there is another way that our wc.c program differs from the behavior of the real UNIX wc utility, which is that we do not attempt to emulate the results that wc produces for word counts when operating upon binary files. But we only care about the counts of lines, words, and characters in text files, not binary files, so this discrepancy does not concern us. Note that all of the project tests will be text files. A Development procedure review A.1 Obtaining the project files and compiling your program Log into the Grace machines and use commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project11/project11.tgz This will create a directory project11 with the necessary files for the project, including our wc.c and the public tests. You need to cd to the project11 directory, copy wc.c to wc-threaded.c, and modify wc-threaded.c to use threads as described above. In most projects this semester you wrote functions that were called from our main programs. A few projects were exceptions, in that your code was the main program. This project is one of those exceptions. Like wc.c, your wc-threaded.c will be a complete, standalone program (although one that uses multiple threads). Because there is only one executable program in the project, consisting of only one file, you don’t have to write a makefile, and you can just compile your program by hand. (You can write a makefile if you want; it will just be ignored on the submit server.) By now you should know how to use the gcc compiler (look in the UNIX tutorial for information if you need to), but don’t forget to add the -lpthread option necessary to compile (actually link) programs using Pthreads. © 2023 L. Herman; all rights reserved 2 A.2 Running your program, checking your results, and submitting The public test inputs are just text files, and your wc-threaded.x program will be run with the names of the files as command–line arguments. However, the public tests consist of small shell scripts, that just run your program with the right arguments. This is because some tests have multiple command–line arguments, and the shell scripts avoid your having to type commands with multiple arguments when you want to run the different tests; the public test scripts do this for you. Run the script public1 to run your program for the first public test, public2 for the second one, etc. The scripts all assume that the executable filename for your compiled program is wc-threaded.x so that is the name you have to use when compiling. As before, use diff to compare your program’s output to the public test outputs that are in the project tarfile, for example public1 | diff – public1.output will test your code’s results on the first public test. You can also run run-tests2 to run your program on all the tests at once. Note: since there is not a required makefile for this project, be sure to recompile your program before running the tests, if you make any changes to it! As has been discussed in class, a shell script is just a file that contains commands to be executed, so if your program is failing a test, look at that shell script to see what command or commands it is running, and run those commands manually to see what’s going on. Running submit from the project directory will submit your project, but before you submit you must make sure you have passed all the public tests, by compiling and running them yourself. A.3 Checking whether your program is concurrent If you run one of the public tests and your program prints the three right numbers as output (diff says there are no differences between the test’s output and the expected output) then your program must be computing its result properly. But your program might be producing the right results, yet not be using threads correctly, which could cause you to lose significant credit during grading, despite your output being right. One error that is sometimes made by students without much experience with concurrency is writing programs that only run one thread at a time, which completely defeats the purpose of even using concurrency. There are various ways that things can be done wrong so that only one thread runs at a time, so it is difficult to explain every kind of mistake that could be made. But since you would lose considerable credit if your program only runs one thread at a time, you should take a few minutes to make sure your program is not doing this. The easy way to check for this is to run your program under gdb, because gdb will print messages when each thread begins and finishes. You do not even need to set any breakpoints in gdb, just run wc-threaded.x under gdb with some filenames as command–line arguments (as an example, run gdb wc-threaded.x, then at the gdb prompt, run run public*). If you get output that looks something like what’s on the left below, and different times you run the program you see different orders of threads starting vs. exiting, your program should be running multiple threads concurrently. (The hexadecimal numbers are the thread’s IDs just in hex, LWP stands for “lightweight process”, and the integer following “LWP” in each line is a number that the kernel uses to refer to threads.) However, if you get output like that on the right below, and even when you run the program different times you always see every thread being created and exiting before the next one is created, then your program is not using threads correctly. [New Thread 0x7ffff77f0700 (LWP 11671)] [New Thread 0x7ffff6fef700 (LWP 11672)] [New Thread 0x7ffff67ee700 (LWP 11673)] [Thread 0x7ffff77f0700 (LWP 11671) exited] [New Thread 0x7ffff5fed700 (LWP 11674)] [Thread 0x7ffff6fef700 (LWP 11672) exited] [New Thread 0x7ffff57ec700 (LWP 11675)] [New Thread 0x7ffff4feb700 (LWP 11676)] [Thread 0x7ffff67ee700 (LWP 11673) exited] [Thread 0x7ffff57ec700 (LWP 11675) exited] [Thread 0x7ffff4feb700 (LWP 11676) exited] [New Thread 0x7ffff77b9700 (LWP 20342)] [Thread 0x7ffff77b9700 (LWP 20342) exited] [New Thread 0x7ffff77b9700 (LWP 20343)] [Thread 0x7ffff77b9700 (LWP 20343) exited] [New Thread 0x7ffff77b9700 (LWP 20344)] [Thread 0x7ffff77b9700 (LWP 20344) exited] [New Thread 0x7ffff77b9700 (LWP 20345)] [Thread 0x7ffff77b9700 (LWP 20345) exited] [New Thread 0x7ffff77b9700 (LWP 20346)] [Thread 0x7ffff77b9700 (LWP 20346) exited] [New Thread 0x7ffff77b9700 (LWP 20347)] By the way, even if you do see that your threads are running and exiting in different orders when you run the program different times, this does not guarantee that you are doing everything correctly with concurrency. You could somehow © 2023 L. Herman; all rights reserved 3 be unnecessarily limiting the concurrent execution of threads inside the code where the thread function is reading from the file. But although this does not guarantee that your code is perfect, it will let you know, before you submit, whether you are making several common types of mistakes, so you can fix them if so. A.4 Grading criteria Your grade for this project will be based on: public tests 65 points secret tests 35 points But note that, as explained above, if you do not use threads as described, your score will be zero. B Project–specific requirements, suggestions, and other notes • If your wc-threaded.c doesn’t use multiple threads– one for each file argument– you will not receive any credit for the project. (The entire purpose of this project is to use concurrency and threads.) • Your threads must return values. There are different ways of returning values from threads in Pthreads, but you cannot just have the threads store their counts in global variables or static local variables or other shared variables. There must be code in your program after creating and running each thread, when the threads finish, that gets their return values and adds them to three cumulative sums, and this must be done outside the threads (after the threads finish). You also may not change any of the existing variables in the wc.c program by making them global or static local variables. You will lose significant credit unless each thread returns the three counts of lines, words, and characters in the file that that thread is reading. • Your program must use the minimum synchronization necessary to achieve correct results, but no more synchronization than that. At the extreme, if you were to only allow one thread at a time to do anything, your program would work fine– but as described you would effectively not even have used concurrency. Minimum synchronization just means that the program’s threads must be able to run concurrently as much as possible, and a thread should only wait for another one to do something when absolutely necessary to ensure correct results (meaning threads should only wait for others in cases where correct results cannot be achieved without that). • Your program code can not call sleep() anywhere. Some lecture examples of concurrency used sleep() in order to cause the results of small concurrent programs to be more random, so that you could see concurrency working. However, a more realistic concurrent program like this one will exhibit different behavior on its own; you do not need to make this happen artificially. Things have been set up so your program will not even compile on the submit server if it calls sleep(). Also do not use a loop to make a thread wait for another one thread to do something. This is called busy–waiting, and it is not efficient. Appropriate ways for threads to wait other threads to do something have been covered in the course. • You can only use the Pthreads features that have been covered in class, or you will lose significant credit. • When your program creates threads it will have to save their IDs. You don’t know in advance, while coding, how many threads your program will have to create, because you can’t see the secret tests. Situations such as this, where you have to store data and don’t know how much data there will be (the program can only tell how much data it needs to store while it is actually running) are situations where memory allocation must be used. So you must use some sort of dynamically–allocated memory or data structure to store the threads’ IDs. (If your program just creates a giant fixed–size array, without allocating memory you may fail secret tests, and you will also lose credit during grading.) • All your code must be in the file wc-threaded.c. No other user–written source (.c) or header files can be added. (Otherwise things probably won’t compile on the submit server.) © 2023 L. Herman; all rights reserved 4 • The project style guide disallows using (in general) global variables in projects. If you want you may use global variables in this project, but (as stated above) you cannot use them to “return” values from thread functions, and you cannot use them to keep track of the cumulative counts of lines, words, and characters. We stress that the project can be written without using any global variables. If you are trying to use global variables you are at the minimum making things more difficult for yourself than they need to be, or you are violating the requirements above in a way that would cause you to lose significant credit. Instead of using any global variables, ask for help in the TAs’ office hours to understand why you don’t need to use them. Keep in mind that you may not change any of the existing variables in the program by making them global or static local variables. • When your program has to allocate any memory, you will lose credit if you cast the return value of any memory allocation functions. Besides being completely unnecessary, in some cases this can mask certain errors in code. • Your program should check whether any memory allocations are successful; if any are not, it should print some sort of explanatory message and quit. It doesn’t matter how you accomplish this. • Your program must free any dynamically–allocated memory once it is no longer in use. One of the public tests tests this, so you will fail that test if you are not freeing allocated memory, and secret tests may also test this. • As Project #9 explained, you cannot run gdb (or valgrind) on a shell script (which the public tests are). As explained more fully in Project #9, if you are failing a test and want to use the debugger, run gdb wc-threaded.x, look at the public test shell script, and run your wc-threaded.x in gdb with the same command–line arguments that the script is running it with. Similarly with valgrind. • You can create you own tests by copying the public test scripts and editing them. You will have to use the chmod command given in the Project #10 assignment to give your scripts executable permission. • For this project you will lose one point from your final project score for every submission that you make in excess of four submissions. You will also lose one point for every submission that does not compile, in excess of two noncompiling submissions. Therefore be sure to compile, run, and test your project’s results before submitting. C Academic integrity Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, publicly providing others access to your project code online, or unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to projects. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus – please review it now. The academic integrity requirements also apply to any test data for projects, which must be your own original work. Exchanging test data or working together to write test cases is also prohibited. D End–of–semester ELMS and TerpConnect account cleanup D.1 Saving information from ELMS (if desired) Soon after the semester the class ELMS space will become inaccessible, so if there are any materials from it (lecture slides, handouts, etc.) that you might possibly want later that you don’t already have, be sure to download and save them after finals. Students often find that they want to go back and review material from this course in later CMSC courses, because this material is used again in multiple later courses. And students often want to review CMSC 216 material in preparing for technical interviews. If you may ever want copies of your coursework or the course materials you have to save them yourself after the semester, because due to the size of the course we will not be able to provide them to you in the future after the ELMS space is gone. If you want to save copies of any PDF materials from ELMS you should also save the PDF password somewhere. (Due to the size of our courses it won’t be possible to provide this in the future to anyone who didn’t save it.) © 2023 L. Herman; all rights reserved 5 Students also sometimes want to go back and refer to their CMSC 216 projects in later courses when they need to use the material in this course there. Also, companies sometimes want to see examples of class projects during internship or job interviews. After a year the CS department removes projects from the submit server, so you won’t be able to download them from it after that. (Note that until then you can still see your projects by going to https://submit.cs.umd.edu and clicking on Older semesters.) D.2 Saving information from Grace (if desired) A couple weeks after the semester ends you will lose the ability to log in to the Grace systems, although you will still be able to log into terpconnect.umd.edu and access your course disk space there, at least for a limited time. However, early the next semester you will lose permission to access your course disk space, including all your projects, as well as everything in ~/216public, because the class space will be automatically deleted then by DIT (the Division of Information Technology). (As mentioned above, after a year, course projects are also removed from the CMSC submit server.) So if you want to save any of your projects or other coursework, or anything that we provided (lecture or discussion examples, secret tests, etc.), you will have to do so yourself over the summer. Sometimes companies want to see examples of class projects during job or internship interviews, and students taking upper–level CMSC courses often want to go back and look at relevant projects and materials from earlier courses. The instructional staff will not be able to provide copies of these in the future, so you’ll need to save them yourself if you might ever want them. Recall that the -r option to the cp command recursively copies a directory and all its contents, including subdirectories, so a command like cp -r ~/216/ ~/216.sp23 would copy everything in your extra course disk space (which the symbolic link 216 in your home directory points to) to a directory in your home directory named “216.sp23” (use a different name if you like). (Of course you have to have enough free disk space in your TerpConnect account to store the files; use the quota command when you are logged into terpconnect.umd.edu to check this.) Although you will lose login permission to the Grace systems after the semester (unless you’re taking another course using them), you will still be able to log into your TerpConnect account as long as you’re associated with the University, via terpconnect.umd.edu. You can also download files to your own computer; if you’re using Windows the left pane of MobaXterm will allow you to do this. (Click on Session, then on SFTP, then log into Grace. Hopefully it’s self–explanatory from there.) On a Mac or Linux system you should be able to just open a terminal, cd on your computer to where you want to copy the files, and use a command like the following, where loginID is your directory ID, and the final period refers to the current directory: scp -r [email protected]:216 . I have known many students who lost all of their coursework and other data, even though they had everything on their computer, because something happened to their computer, for example their laptop was stolen, or their hard drive/SSD died. I suggest that if you want to save your information it’s not sufficient to just have it on your computer, you should have an external backup as well. I have known a few students who had issues with cloud storage of their data, so I don’t completely trust that. I recommend just buying an external USB backup drive, connecting it to your computer once a day, and setting things so it backs up your files daily. (They are not expensive– I can find external USB drives from what a high–quality manufacturer for around $55 for a 1 TB (terrabyte) drive, $70 for a 2 TB drive, and $100 for a 4 TB drive.) D.3 Resetting your Grace (TerpConnect) account After you’ve copied the files from Grace that you want to save, and you have looked at or copied the secret tests (if you want) for the remaining projects when they are provided, the last step you need to do is to undo the changes to your account that you made during an early discussion section as described below, since later courses may use the Grace systems, and the changes you made for this course will probably conflict with changes necessary for them. Here is how to undo the changes to your account (the steps below assume that you are located in your home directory): 1. First just run /usr/glue/scripts/newdefaults (using its full pathname). This is a shell script provided by DIT that will replace any account control files in your home directory that you modified at the beginning of the semester with new unchanged copies (meaning as they were before you modified them). For example, you © 2023 L. Herman; all rights reserved 6 modified the file .path as part of your account setup, so this command will replace your .path file with the original version. Note that this will not lose any information, because before copying new versions of files the newdefaults script will rename your current version with the year, month, and day that you run it. So if you run the command on May 28, your current .path file will be renamed as .path-23-05-28, then the script will copy the original version of the .path file to your home directory. The account control files that you modified when setting up your account were .emacs, .path, and .cshrc.mine. Possibly you might have created aliases of your own in your .aliases file, and possibly you might have added customizations of your own in your .emacs file. (It is unlikely that you modified any other account control files but if you did, using ls -a after running the newdefaults script will show them with a suffix like 23-05-28.) If you want to keep your modified version of one of the account controls files, for example suppose that you want to keep the changes that you made to your .aliases file, just copy it back after running the newdefaults script. For example (given the assumptions above) cp .aliases-23-05-28 .aliases will do this. (But if you want to keep your changes to your .emacs file, comment out the line starting with load that you added to it, because it is referring to a file in 216public that will cease to exist when DIT removes the class files, which might lead to errors. Comments in Emacs control files start with a semicolon, so just put a ; character at the beginning of that line to disable it.) 2. Then remove the symbolic link named 216 that you created from your home directory to your extra disk space, by just removing the symlink, as in rm 216. You could still reach the files in your extra disk space after that if you wanted to (until you lose access to them) by just using the full pathname, for example, instead of cd 216 you could still use a command like: cd /afs/glue/class/spring2023/cmsc/216/0101/student/loginID 3. Lastly, remove the symbolic link named 216public that you created in your home directory, which points to the class public directory, after copying any files you want from there, as in rm 216public Make sure you can log out and log back in successfully, and are able to list and view files in your home directory after that, to ensure that you didn’t make any mistakes performing the changes. © 2023 L. Herman; all rights reserved 7
1 Introduction Your task in this project will be to take three C programs we give you and write equivalent assembly programs. Remember that information about MIPS is on the Administrative and resources page on ELMS. We demonstrated running the QtSpim graphical simulator in class. If your account is set up right you should be able to run it without having to do anything special or different, but you have to be using an X server to run the graphical simulator. Read the requirements in Section 3 carefully before coding, and again after finishing the project. Failure to follow these requirements will result in losing significant credit, so be sure to read them carefully. Ask in advance if you have questions. There are two assembly homeworks: Homework #8 is on basic assembly and is on ELMS now, and Homework #9 will be on functions in assembly and will be on ELMS on Thursday. If you do these homeworks and understand the answers, you will have already gotten practice with every assembly language feature that this project requires. If you come to the TAs’ office hours to ask any questions about writing assembly, you must have already done these homeworks. The TAs will not answer any questions about the project unless you can show them that you have first tried to do the assembly homeworks. (Of course if you need help with the homeworks you can ask in office hours.) Due to the size of the course it is not feasible for us to be able to provide project information or help via email/ELMS messages, so we will be unable to answer such questions. However you are welcome to ask any questions verbally during the TAs’ office hours (or during, before, or after discussion section or lecture if time permits). However, you cannot wait to write projects in a course with this many students. 1.1 Extra credit If you make only one submission that passes all the public tests you will get 3 extra–credit bonus points. And if your single submission is made at least 48 hours before the on–time deadline you will get 3 additional extra credit bonus points, for 6 extra credit points total. (Obviously you can’t get the second 3 extra credit points if you don’t get the first 3, but you can get the first 3 without getting these second 3.) (If for some reason your program passes all the public tests on Grace, but doesn’t work when you submit it, so you have to submit more than once– and this is not due to an error on your part or a bug in your code– talk with me in office hours about receiving extra credit despite having more than one submission.) You will again lose credit for having too many submissions. 2 Project description, and programs to be written The project tarfile contains three C programs prog1.c, prog2.c, and prog3.c that you must translate to MIPS assembly programs. Extract the files from the tarfile using commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project10/project10.tgz For each C program you must write a MIPS assembly language program that works the same way and does the same things, so it will produce the same output as the C program if given the same input (with one exception discussed in Section 2.4). Your MIPS assembly program files must be named prog1.s, prog2.s, and prog3.s. Below we briefly explain what the three C programs do, so we can discuss some things about them, but the definitive description of what they do is the programs themselves, available in the project tarfile. Your assembly programs should do things exactly the same way as the C programs do. Your assembly programs should all terminate via the MIPS system call to terminate execution; abnormal termination should never occur (except in the case of an I/O error, which you are not required to handle). 2.1 Rectangle comparison program (prog1.c) This program reads four integers from the input and stores them into four global variables l1, w1, l2, and w2. The numbers respectively represent the length and width of one rectangle and the length and width of another rectangle. The program just prints −2 if any of the four dimensions were negative. Otherwise the program calculates the areas of both © 2023 L. Herman; all rights reserved 1 rectangles and prints 0 if the rectangles have equal area, −1 if the first rectangle’s area is less than that of the second rectangle, and 1 if the first rectangle’s area is greater than that of the second rectangle. The program prints a newline after whichever number it prints. This program will be the easiest of the three to write because it does not use any functions other than main(), and main() uses only global variables. 2.2 Digit counting program (prog2.c) This program determines the number of digits in any number in any positive base. It reads one integer into a global variable number and another one into a global variable base. It then passes these two numbers into the parameters n and base of a function num_digits(), which computes and returns how many digits its parameter n would have in the base base, using repeated division. When the function returns, its return value is stored in another global variable result in main, which is then printed, followed by a newline. Here is how the function handles various special cases: • A base of zero doesn’t make any mathematical sense, so if its second parameter is zero, the function returns −1 to indicate that this is an error case. • Mathematically it is actually possible to have a negative base. In fact, a few early mainframe computers stored numbers using a negative base system, because this allows positive and negative numbers to be stored using the same representation (unlike modern computers, which use a different representation, twos complement, for negative numbers). We do not explain this more fully here because, because since modern computers do not use negative bases we are uninterested in the number of digits of numbers in negative bases, so the function also considers this to be an error case and returns −1 in such cases as well. • Zero has one digit, which is 0. Zero has to be handled as a special case though, because without it the function would just return 0 for zero, meaning no digits, but zero has one digit (except as noted below). • Negative numbers have a number of digits also. The preceding negative sign is not a digit. For example, −543987 has six digits. The function handles this as a special case using this if statement and the unary prefix negation operator: if (n < 0) n= -n; You may look at the function and think that this case is unnecessary and that the function will work right without it. In fact, if you were to comment it out the function would work correctly– except it wouldn’t return the right result in one specific case. So this if statement is necessary. • Unary notation has only one digit. It doesn’t matter what is used for that digit– logically, using the symbol 0 for the digit would probably be most consistent with other bases– but conventionally 1 is used. In unary, the magnitude of a number is just the number of 1s in its representation. For example, 310 is 1111 (meaning in unary) and 1210 is 1111111111111 . Unary numbers can also be negative, with a preceding negative sign. The number of digits of a number in unary can’t be calculated using repeated division, because that would mean dividing by 1 at each iteration, so the number would not change. So the number of digits in a unary number has to be handled as a special case; it is just the magnitude (absolute value) of the number itself. Note that any number has at least one digit in any positive base, except for zero in unary, which as mentioned just has zero digits. This program (and the next one) will be more work than the first one because they have functions other than main, and the functions have parameters, local variables, and return values. Carefully read and study the PDF lecture examples of functions in assembly. They illustrate everything you have to know and to use. But you have to understand everything that they are doing, before trying to write assembly code yourself for this program. Study them and if there is anything that you don’t understand completely about what they are doing or why, ask in office hours before starting to write this part of the project. Here are suggestions for developing the program one part at a time, assuming you are up to speed on the assembly function examples. (Even someone who is familiar with things can still make mistakes coding in assembly, because it’s very easy to make errors in assembly but very hard to find them.) © 2023 L. Herman; all rights reserved 2 • First just write the main function to read two numbers (your first program also read numbers, so you should understand how to read integers now) and just print them afterwards (you should also know by now how to print an integer). Run the program and make sure this works. You don’t really have to print the numbers after they are read– you can instead see whether the numbers are being read into registers correctly in the QtSpim graphical simulator– but the main function has to print something at its end anyway, and it’s not that many instructions to print a number. • Then just write an empty num_digits function. In other words, try writing a num_digits function that has no local variables or parameters, which doesn’t do anything other than to immediately return (after its stack frame is created), and call it from the main function between reading and printing the two numbers. Run the program and make sure this works. If so, you have some indication that you are creating the stack frame for num_digits correctly and removing it later. • Then just modify the empty num_digits function to return a hardcoded value, like 216, meaning that it won’t actually compute the number of digits in numbers yet, and it won’t even have parameters yet, it will just always return 216, or whatever other number you like. Also modify the main function to print the value returned by the num_digits function after calling it. Run the program and if this works then it seems you are able to return a value from an assembly function. • Then modify both the main function and the num_digits function to add two parameters to num_digits, and pass the two numbers read in main into it. The num_digits function should just ignore its parameters and still just return a fixed value (like 216), but if you test this and it works, so your program is able to call a function with parameters and print the return value afterwards (in the main program), you have some confidence that the code for your function is properly manipulating the runtime stack when a function is called and when it exits, even if the function has parameters. • Then modify the num_digits function to just print both of its parameters, before returning the same hardcoded value. Test this and if it works you have some confidence that you are able to correctly access parameters in an assembly function. • Then have the function just return one of its parameters, meaning one of the two numbers that were read and passed to it, instead of always returning a fixed number (like 216) as above, and still instead of trying to compute the number of digits of numbers yet. Run the program and make sure this works. • Lastly of course, add instructions to the num_digits function to actually compute and return the number of digits in its first parameter in the indicated base (second parameter). 2.3 Recursive digit counting program (prog3.c) This program turns the function in prog2.c into a recursive function. The program’s results will be the same in all cases as the previous one; it just uses recursion instead. This program must use a recursive function. Zero in unary is just no digits at all, meaning just the empty string. Note that second program, prog2.c described above„ has anomalous behavior and is mathematically not correct for this case, in that it prints 1 if it is asked to determine the number of digits of zero in unary. (This program, prog3.c is correct in this regard.) Your assembly programs should agree 100% with what the C programs do, even if the C programs differ for this case. (This should be the only case that they differ for.) Note that you may not have to repeat most of the steps above in this program– if you already wrote the second program then you already have a num_digits function that is being called correctly from the main function and that computes and returns a value. You just have to change it to call itself, which may be a little more tricky to wrap your head around than an iterative function, but of course we have provided examples of recursive assembly functions. 2.4 Reading input All three programs read input, and they just read integer inputs. You may assume that legitimate integer values will be input to all programs, and that the numbers will be small enough to fit into an int. Note that you can’t use the mouse to copy and paste input into the input window in the graphical simulator QtSpim, you just have to type input manually (pressing return after each number entered if a program reads more than one number). © 2023 L. Herman; all rights reserved 3 3 Requirements In this project you are a compiler. In particular, you are a compiler that does not do any optimization. There are many different ways that the three C programs could have been written differently yet produce the same results, but you are not going to modify the C programs when you convert them to assembly, because that is not what a compiler does. A compiler simply converts C programs, as the programmer wrote them, to assembly. What this means is that you must follow the programs 100% accurately in translating them to assembly. Your assembly programs must use the exact same algorithms as the corresponding C programs. They must have instructions that implement the statements that are in the C programs, just converted to assembly. The functions must have the same number of returns as the C programs. They must have the exact same local variables and parameters as the functions in the C programs do. Every function, local variable, global variable, if statement, loop, etc., which is present in each C program needs to likewise be exactly present in the assembly program that you write for it. And there should also not be anything in each assembly program that is not in the corresponding C program. If you don’t follow the programs exactly in translating them, you will lose significant credit. As an extreme example, it would easily be possible to write a program that had the same effect as the second or third programs, without using a separate function at all. However we will detect this in grading and you would lose all credit for that part of the project as a result. (If this caused you to pass fewer than half of the public tests of this project then you would be in danger of not passing the course, regardless of overall grade.) Here are more specifics and requirements: • For the second and third programs that use functions, each function must use registers beginning with $t0. You cannot try to “reserve” different registers for use in different functions. First, if you do this, we will deduct significant credit. Second, and more important, compilers do not do this because it will not work. Trying to do this for even small programs such as the second and third ones in this project would require more registers than the machine has. So just start using registers with $t0, $t1, etc., in each function. • A consequence of the above is that any statement in one of the C programs that has a side effect must immediately cause something (the modified variable) to be stored in memory. The semantics of side effects are that they cause memory to be modified, so you cannot just keep variables only in registers (there are not enough registers to be able to do this). Registers temporarily store operands and results of computations, but operands of computations are first fetched (loaded) from memory, and results of computations are stored back in memory right after they are produced. For example, the first C program has ten side effects– four scanf()s and six assignment statements. So your assembly program must have ten sw instructions. The number of sw instructions in the second and third programs will be more than the number of scanf()s and assignments, because passing arguments to functions and creating a function’s stack frame also involve storing values in memory. • Related to the above: when an assembly function makes any function call– which could either be to another function, or even a recursive call to itself– after the function call it cannot assume that any registers have the same values that they did before the call. Of course if assembly code puts a value in a register and uses that register later, and it has not changed the value in that register or made any function calls in the meantime, the register will still have the same value. But different functions use the same registers– because there are not enough registers for each function to have its “own” registers– so registers will almost certainly not have the same values after a function call as before. So what assembly code must do is to immediately store the result of any statement that has a side effect into memory– which must be a memory location in the runtime stack if the side effect is changing a local variable or parameter– and after any function call, any values that are needed again must be reloaded into registers from memory (from the runtime stack, if the values needed are local variables or parameters). • The second and third programs (that have functions) must pass the same arguments as their C main programs pass to their functions, using the runtime stack, as shown in class. You cannot just use global variables or registers to communicate values from one function (including main) to another function. And because the functions in the C programs only use local variables and parameters– and do not use any global variables– your assembly functions (other than main) can only use local variables and parameters and not use any global variables. © 2023 L. Herman; all rights reserved 4 If you don’t faithfully follow the program and store all function arguments in the runtime stack you will lose significant credit. • Your functions in the second and third programs must pass their return values back via a register, as illustrated in class (not for example using a global variable). • The functions in the second and third assembly programs must have a local variable for every local variable in the corresponding C program. Similar to the previous item, all function local variables must be stored in the runtime stack, not in the data segment with a label (only global and static variables are stored in the data segment). For example, the function in the third program is shown on the left below (just formatted slightly differently in one place). The version to the right of it would work exactly the same, but your assembly code must implement the version on the left (storing the results of special cases and the value returned by the recursive call into the local variable ans, then returning the value of ans, rather than just directly returning the special case values or the result produced by the recursive call), because one thing the third C program is testing for is your code being able to store and access local variables in a recursive function. • As mentioned, the num_digits functions in the second and third C programs only use local variables and parameters; they do not use any global variables. Only the main functions use global variables. Your assembly functions must do the same. If your assembly programs don’t follow these things you will lose significant credit. static int num_digits(int n, int base) { int ans; ans= 0; if (base
1 Introduction and purpose In this project you will write a small simulation of a UNIX utility xargs, described below. The purpose of the project is to get some experience with process control and basic systems programming concepts. The difficulty in this project is in understanding what to do, not in the amount of code (which is actually relatively small) or complexity of data structures. Note that your program in this project does not have to create any pipes. And note that you may not use the C library function system() anywhere in your code, or you will not receive credit for the project. For reasons described below, this project will only have public tests. Due to the size of the course it is not feasible for us to provide project information or help via email/ELMS messages, so such questions will not be answered. You are welcome to ask any questions verbally during the TAs’ office hours. First we explain the basics of the actual UNIX xargs utility. Then we describe what your program has to do, including some examples. Although the first time you read the project this may be somewhat confusing (and even the second time), Section 4 gives a step by step outline of the things your program has to do. 1.1 Extra credit and number of submissions You can again get extra credit for this project. If you make only one submission that passes all the public tests you will get 3 extra–credit bonus points. And if your single submission is made at least 48 hours before the on–time deadline you will get 3 additional extra credit bonus points, for 6 extra credit points total. (Obviously you can’t get the second 3 extra credit points if you don’t get the first 3, but you can get the first 3 without getting these second 3.) However, as before, if you make too many submissions you may again lose credit. To avoid making submissions that don’t compile or don’t pass the public tests you should compile your code, run the tests, check their output, and fix any problems, all before submitting. (If for some reason your code passes all the public tests on Grace, but doesn’t work when you submit it, so you have to submit more than once– and this is not due to an error on your part or a bug in your code– you can talk with me verbally in office hours about receiving the extra credit despite having more than one submission.) 2 About the UNIX xargs utility UNIX utilities are generally small programs that do one thing, but are designed to easily work with other utilities or programs, so they can be used together to do more complex things when needed. Shell pipes are one way to make programs work together, where the standard output of one program or command is used as the standard input of another program or command, whose standard output in turn can be piped to another one, etc. But shell pipes are not helpful with programs or utilities that operate upon command–line arguments rather than standard input, for example if we want one program’s output to be used as another program’s input, but the second program operates upon command–line arguments rather than standard input. In fact, many UNIX commands/utilities do not read any standard input and operate only upon command–line arguments, for example ls, cp, mv, rm, mkdir, etc. One solution that can be used in cases like these is to write a shell script to run the second program using the output of the first program as command–line arguments, but this is more work than needed in many cases. Another solution that is sometimes appropriate is “backquote command substitution”, where one program is backquoted in the arguments of a second program, for example, ls `program1.x`. This causes the output of the backquoted program (program1.x here) to be used as command–line arguments for the first program (ls here). Although this is very useful at times we do not explain it further here. A third solution is the xargs utility, which is the subject of this project. It converts its standard input to command line arguments for another program. xargs has many capabilities, the majority of which we don’t need to worry about. xargs for this project differs from the real xargs UNIX utility in a number of ways, and in fact it is much simpler than the real xargs utility. To differentiate the UNIX xargs utility from the program that you will write to simulate it we will use xargs to refer to the UNIX utility, and your program will be named yargs.c. We will just use yargs to refer to your program, yargs.c to refer to its source file, and yargs.x to refer to its compiled executable. © 2023 L. Herman; all rights reserved 1 Your code will be in a file yargs.c, which will be a standalone program as in Project #2. In order to run the other program that it executes, xargs has to create a child process that will actually run the other program; this is what your yargs simulation of it will also have to do. 3 Usage of yargs.x yargs.x will be invoked as follows: yargs.x -n target–program target–args where all of the command–line arguments are optional, and they are explained next. -n (a minus sign followed immediately by a lowercase ’n’) is an argument to yargs.x itself. If it is present it has the same effect as the -n option to make, meaning that it causes yargs.x to print the commands that it would execute if -n had not been given, without actually executing them. Like the -n option to make, this allows a user to make sure that they are running yargs.x with the arguments and input that will cause it to perform the commands that they want. (Once they have verified that, they will presumably run yargs.x again with the same arguments and input, but without the -n option.) target–program is an argument to yargs.x. If present it is name of the program that yargs.x should execute, using the standard input to yargs.x itself as command–line arguments for target–program. Note that a target program could be a UNIX utility, or a user–created program. target–args is a list of things that will be used by yargs.x as command–line argument(s) for target–program. When running the target program these arguments will precede any command–line arguments that yargs.x takes from its standard input and supplies as additional arguments to the target program. The purpose of the target arguments is for arguments that will be supplied to the target program every time it is run by yargs.x, and the standard input of yargs.x will become different arguments that are given to the target program in addition to (and following) the target arguments. The standard input to yargs.x consists of zero or more lines, i.e., each ending in a newline. For explanation, let input1, input2, ···, inputn denote the successive lines in the standard input of yargs.x (i.e., n is zero if and only if the input to yargs.x is empty). What yargs.x must do is to run target–program target–args input1, then target–program target–args input2, and so on until it runs target–program target–args inputn. In other words, yargs.x will execute the target program once for each line of its own standard input. The first time the target program is run it will have the target arguments as command–line arguments and the first line of the standard input of yargs.x itself as additional command–line arguments. (To do this the input lines will have to be broken up into separate words, discussed further in Section 4.2 below.) The second time the target program is run its command line arguments will be the target arguments followed by the words of the second line of the standard input of yargs.x, and so on for each line of its own standard input. However, if the -n option was given then yargs.x will just print target–program target–args input1, then target– program target–args input2, up to target–program target–args inputn, each on a separate output line, but without actually executing any of them. 3.1 Examples of yargs usage The public tests have examples of running yargs.x in different ways. Here are some examples with explanation. Some of these examples use the UNIX utility /bin/echo as the target program. echo was briefly demonstrated in class when make and makefiles were covered. It just echoes or prints what is on its command line. For example, try a UNIX command something like /bin/echo I want to win chocolate! to see its operation. Suppose the current directory has a file datafile whose contents are the three lines shown on the left below, and a file datafile2 whose contents are the two lines shown on the right. (In each file, each line ends in a newline.) datafile aa bb cc dd ee ff datafile2 program1.c -o program1.x list-test.c list.c -o list-test.x © 2023 L. Herman; all rights reserved 2 yargs.x -n program.x xx yy < datafile In this invocation program.x is the target program name and xx and yy are the target arguments. Because the -n option was given, yargs.x will not attempt to execute program.x (which is not a UNIX command, but it could be a user–written program). yargs.x will just print program.x and the target arguments once for (and followed by) each line of the file datafile (which, due to input redirection, will be the standard input of yargs.x). So yargs.x will simply print these three lines that it would try to execute if -n had not been given: program.x xx yy aa bb program.x xx yy cc dd program.x xx yy ee ff yargs.x /bin/echo < datafile In this invocation the -n option is not used, /bin/echo is the target program name, and there are no target arguments. This will cause yargs.x to actually run the /bin/echo utility for each line of the file datafile. So echo first runs with the two command–line arguments aa and bb (the first line of datafile), then echo runs with its arguments being the second line of datafile, then with its arguments being the final line of datafile. So three lines of output would be produced: aa bb cc dd ee ff The command cat datafile | yargs.x /bin/echo would have the same effect, because the cat command and the pipe also cause the contents of the file datafile to be the standard input of yargs.x. yargs.x mv < datafile In this invocation mv is the target program name and there are no target arguments. Here yargs.x should attempt to run mv three times, once for each line of datafile. The first run of mv will have the two command–line arguments aa and bb, the second run of mv will have arguments cc and dd, and the last run of mv will have arguments ee and ff. The effects of these commands depend on whether bb, dd, and ff are directories, or whether they’re files (or whether they don’t currently exist). The first invocation will run mv aa bb and if bb is a directory it will move aa to the directory bb. But if bb is a file or if bb doesn’t exist, aa will be renamed to bb. Similarly with the commands mv cc dd and mv ee ff which yargs.x will run for the second and third lines of its input. Of course if any of files aa, cc, and ee don’t exist then there will be an error when yargs.x tries to run the command(s) that use the names of nonexistent things. (How yargs.x should handle performing commands that have errors is discussed in the next section.) yargs.x gcc -Wall -ansi < datafile2 In this invocation gcc is the target program name and -Wall and -ansi are the target arguments that will be passed by yargs.x as the first two argument to gcc. The remainder of the arguments each time that gcc will be run will be the words on the lines of yargs.x’s standard input, meaning the contents of each line of datafile2, and will be different during the two runs of gcc. So in this invocation yargs.x should execute gcc twice, once for each line of the file datafile2. As a result, the following commands will be run: gcc -Wall -ansi program1.c -o program1.x gcc -Wall -ansi list-test.c list.c -o list-test.x Note that the real xargs utility does not work the same as these examples, for reasons not fully explained here. Certain options can cause it to work as these examples do, but since you should be concentrating on what your yargs program has to do, we don’t describe the options that would cause xargs to mimic the behavior of your yargs. (If you are interested in learning more about the real xargs utility though you can just run man xargs on Grace. However, to avoid confusing yourself with what xargs would do compared to your yargs, even if you want to look at this information you might want to wait until after you have finished writing the project to do so.) © 2023 L. Herman; all rights reserved 3 4 What your program must do Extracting the files from the project tarfile will create a directory project09. In that directory you must write your program in a file named yargs.c (spelled and capitalized exactly as shown). Here is an outline of what it has to do: 1. It must include two header files safe-fork.h and split.h containing prototypes for functions safe_fork() and split(), which are described respectively later in this section, and in Section 4.2. 2. yargs.x must initially figure out what the arguments were. The first step is to determine whether the -n argument was used on its command line. Note that if the -n option is given, to have an effect it must be the first argument to yargs.x (otherwise it will appear to be the target program name or part of the target arguments). 3. After seeing if the -n argument appeared on the command line, yargs.x must determine the name of the target program, and see if there were any target arguments. For example, if it is run using the invocation yargs.x ls -t, yargs.x will run ls (the target program name here) with target argument -t (followed by arguments taken from lines of its standard input, as described below). • If there were arguments on its command line, and the first one was not -n, the first one is assumed to be the target program name. If there were no arguments after that then there are no target arguments for this execution, otherwise any arguments after the first one are the target arguments for this execution. • If there were arguments on its command line, and the first one was -n, then the second argument must be the target program name. If there were no arguments after that then there are no target arguments for this execution, otherwise any arguments after the -n and the target program name are the target arguments for this execution. • If yargs.x is run without any command–line arguments at all it won’t have anything to do because there is no target program to run, so it will end up just quitting without doing anything. (Its exit status or return value is discussed below.) • If there is only one argument, but it was -n, then again it means that there was no target program, so yargs.x will also quit without doing anything. 4. Once it has examined the command–line arguments, if the -n argument was not used, yargs.x will use its standard input as command–line argument(s) to the target program, running the target program (in child processes) once for every line of its standard input, until the end of the input is seen, with the (whitespace–separated) words of each input line being used arguments as the command–line arguments for the target program. (As above, each line of the input will end in a newline.) Recall that you read until the end of the input in Project #2. Use the function split() described in Section 4.2 to break each input line up into its whitespace–separated words. 5. However, once it has examined the command–line arguments, if -n was given as the first command–line argument to yargs.x (following the name yargs.x itself, of course), then yargs.x must print the commands without executing them. If there are n lines in its standard input then yargs.x will print n lines of output, where the first one consists of the target program name, followed by the target arguments, followed by the words on the first line of the standard input. The second line of output begins with the target program name and target arguments, followed by the words on the second line of the standard input, etc. Regardless of how much whitespace separated the target arguments or the words of the lines of the standard input, yargs.x must print each command with one single space between (separating) the target arguments and the words of the line of standard input. It is strongly suggested that you get the -n option to work right before trying to get your yargs.x program to handle executing commands. If your program is not accessing arguments correctly, or is not reading the lines of standard input correctly, then executing programs based on the target arguments and lines of standard input is certainly never going to work. 6. As mentioned, yargs.x must create a child process each time it runs the target program. This is usually done using the fork() system call, however, do not call fork() directly. We have provided an object file safe-fork.o, which contains a function safe_fork(), which you should call instead. It prevents the possibility of having a serious fork bomb. Your yargs.c should include safe-fork.h so the prototype of safe_fork() there will be visible. safe_fork() has the same return value as fork() and has no parameter, as you can see by looking at safe-fork.h. Use safe_fork(), instead of directly calling fork(). © 2023 L. Herman; all rights reserved 4 We set things up so your program will not even compile on the submit server if it calls fork() directly– but you could still cause a fork bomb on the Grace machines if you call fork() and do it incorrectly, which could cause dozens of other students to lose whatever work they were working on. Do not call fork() directly. 7. yargs.x can run the target program multiple times (once for each line of the standard input), with different arguments each time. The target program could produce different results each time it is run, because it’s being run with different arguments each time. One invocation of the target program might even use results that were produced by an earlier invocation of the target program. Consequently your yargs.x must ensure that each time the target program is executed it finishes running before the next time that yargs.x runs the target program for the next line of its standard input. 8. After each time it runs the target program in a child process, yargs.x will have to determine the exit status of the child process, to know what status it should itself exit with. It will determine its exit status as follows: • If the target program exits successfully every time that it is run (for every line of the standard input of yargs.x), then yargs.x itself must also quit (after running the target program once for each line of its own standard input) with exit status 0. • If any invocation of the target program exits with any nonzero exit code, your yargs.x must stop running the target program (meaning not run it for any further lines of its standard input), and quit immediately with whatever nonzero exit status the target program had. • If any error ever occurs trying to run the target program, meaning it could not even be run, your program must quit (without trying to run the target program for further lines of its standard input) with exit status 255. For example, suppose your program is run as yargs.x bananas are fun. Although perhaps there should be, there is not actually a UNIX command bananas, so, assuming that there is no program in the current directory or on the UNIX search path (see Section 4.1) that happens to be named bananas, when your program tries to create a child process running bananas it will fail, and yargs.x must exit with status 255. 9. Another case where yargs.x should quit without doing anything is if its standard input is empty. Regardless of what command–line were given to it, yargs.x is supposed to do something for every line of its standard input. If it has no standard input then it will not have anything to do, so it will end up just quitting, with an exit status of 0. 4.1 Notes • yargs.x will not use pipes. Any output produced by the child process created to run the target program will just be printed to your program’s standard output. • The output produced when the -n option is used should go to the program’s standard output. Your program has to read its standard input, and produce output when the -n option is used. Do not use low–level UNIX I/O system calls (that were recently covered) to read input or produce output– this is more difficult, and unnecessary in this project. Just use the C standard I/O library functions covered in Chapter 15 of the Reek text. • Before starting to code, look at the way that yargs.x is run in all of the public test scripts, and look at the files that will be used as standard input for each test. Then look at the expected output files. If you aren’t sure why the arguments and input result in the contents of the expected output files then ask about it in the TAs’ office hours before starting to code, so you are sure you understand what yargs has to do. • Several special cases were mentioned above in which yargs.x should not do anything and just quit (with an exit status of 0). These may not have to be handled as special cases; much of this behavior could very well occur naturally in these situation when you write the program to handle normal cases, so do not create special cases for these situations unless they are necessary. • Recall that to see the exit status (or exit code) of a program run in the tcsh shell on Grace, just use the command echo $status. (It has to be the very next command, because it prints the exit status of the most recent command.) • The -n option is not passed from yargs.x to the target program. (It is not a target argument.) It is an argument to yargs.x, which affects the operation of yargs.x itself. • Your program may assume that no input line will ever have more than 10000 characters before its terminating newline. Other than what’s imposed by the maximum length of a line there is no specific limit to the number of © 2023 L. Herman; all rights reserved 5 arguments that may be on a line, or to the number of input lines that your program may have to read and use as command–line arguments to the target program that it is running. Situations where it’s not known in advance how much data has to be read are cases where dynamic memory allocation is needed. • Notice that when your yargs.x runs the target program, this will need to be done using some form of exec (one of the exec system calls). Your program should use the UNIX search path to find the target program to run. Look at the forms of exec mentioned in class and figure out which one or ones can be used in these situations. Recall that the name of a program to run (the target program here) is given twice in a call to exec! It has to appear as the first and second arguments to an exec–family system call. • When a program is run using input redirection, such as program.x arg1 arg2 arg3 < inputfile, the program does not see the “< inputfile” in its arguments. In this example the program would just see three command–line arguments arg1, arg2, and arg3. This is because when input redirection is used the shell creates a process to run the program in, that process’s input is redirected to come from the input file, and the program is run in that process. The program has no way to know or tell where its standard input is coming from, whether from the keyboard or in this case from the file inputfile– it just reads input. So yargs.c does not need to look for input redirection symbols or input files on its command line, it just needs to read its input, no matter where it is coming from. Similarly, when input is redirected to come from a pipe (as in program1.x | program2.x arg1 arg2 arg3), the pipe symbol and whatever is before it don’t appear in the program’s arguments, so “program1.x |” will not be in the arguments for program2.x here. program2.x will just see the output of program1.x as its standard input when it reads input. • yargs.c will have to free any dynamically–allocated memory when it is no longer needed or it will fail some tests. 4.2 Our function char **split(const char line[]) To facilitate the process of reading input lines and breaking them up into words we are supplying you with a function split(), with prototype above, in the object file split.o (with associated header file split.h). It will take a string and break it up into its components, where each component (except as indicated below) is a word, separated from other words by either whitespace, or by the beginning or end of the string. split() returns a dynamically–allocated array of dynamically–allocated strings, each of which is one word of the line. The array will end with a NULL pointer, so its last element can be detected. split() will ignore (discard) blank spaces, tabs, and newlines before and between words. Its argument string optionally can end with a newline. (Since you will be reading lines from the input and calling split() on them, by default they would end in newlines.) For example, if called on an input line parrots eat carrots (which ends with a null character), split() will return a dynamically–allocated array with four elements, which will be (in order) the strings parrots, eat, carrots, and the fourth element of which is just NULL (the three non–NULL strings, and the array itself, are all dynamically allocated). Note the result would be the same if more than one blank space or tab separated the words. The UNIX shell has many characters that have special meaning, for example backslashes can be used to escape certain special characters, and single quotes have somewhat different effects than double quotes. For simplicity, our split() function does not attempt to emulate these behaviors, other than one, which is that a double–quoted string is treated as a single argument, and whitespace is preserved inside double–quoted strings. For example, if your program is run as yargs.x⊔test.x⊔one⊔”two⊔⊔three⊔ ⊔ ⊔four”⊔five the target program test.x must be run with three command– line arguments, the second of which will be “two⊔⊔three⊔⊔⊔four” (containing spaces as shown), and split() would treat it as a single argument. The array returned by split() is dynamically allocated, and all the strings that the array points to are also dynamically allocated. To avoid memory leaks you will have to free this memory when it is no longer needed. Note that the array argv of command–line arguments to your program is of the same form as the array returned by split(), but the argv array should not be explicitly freed– it is created automatically before a program starts, and automatically freed if need be when the program ends. If a program tries to explicitly free its command–line arguments it will most likely result in a fatal error. Do not try to write your own split() function. We are giving you a correct split() function. If you write your own it might not always work correctly, or might not be consistent with ours in all cases. Just call our split() function to break up lines of the standard input into their constituent words. © 2023 L. Herman; all rights reserved 6 A Development procedure review A.1 Obtaining the project files and compiling Log into the Grace machines and use commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project09/project09.tgz This will create a directory project09 that contains the necessary files for the project, including the public tests and associated data files, safe-fork.h, safe-fork.o, split.h, and split.o. After extracting the files from the tarfile, cd to the project09 directory, create a file named yargs.c (spelled exactly that way) that will #include the header files safe-fork.h and split.h, and write the program as described above. Since your program will be a self–contained standalone program, you don’t need to write a makefile for this project (although you are welcome to if you like; if so it will just be ignored on the submit server). You can compile your program just using the (exact) command: gcc yargs.c safe-fork.o split.o -o yargs.x (Of course if you do write a makefile you need to ensure that the required compilation options are being used.) A.2 Checking your results and submitting The tests of this project will all be UNIX shell scripts, named public01, public02, etc. These public test scripts have no standard input. They will run your program with different arguments, and although the scripts do not read any input, they may run your yargs.x program with input redirected from a file or piped from a command. The public test scripts produce output that can be compared with the expected output, for example: public09 | diff – public09.output The test script (public09 in this example) is running your yargs.x with a target program, a target argument, and some input; you can see the exact commands that each test is using to run yargs.x by looking at the scripts using less. These scripts are expecting the name of your executable program to be yargs.x, so you must use that exact name when compiling. In a recent discussion section your TA showed you the run-tests shell script, which runs a program on all of its tests at once. However, the run-tests script will not work correctly for this project, because run-tests is expecting the project tests to be compiled C programs (executable files) rather than shell scripts. A modified version of run-tests named run-tests2 has been put on Grace that you can run instead for this project– after you compile yargs.x, just running run-tests2 will execute all of the shell scripts (tests) and indicate whether they all passed or any failed. We can’t use our memory checking functions to test your program for memory leaks or problems in the heap, because you’re writing a self–contained program with a main() function. However, some of the tests will use valgrind with the right arguments to check your program for problems of this type. Since our memory checking functions are incompatible with valgrind, do not use our memory checking functions in your program, or it will fail these tests. A.3 Grading criteria Your grade for this project will be based on: public tests 85 points programming style 15 points As you can see, and as mentioned above, there are only public tests for this project. For you to come up with your own tests of your program you might have to think about what UNIX commands to run with what arguments, and you would also need to write your own shell scripts. But while understanding basic UNIX shell script concepts is useful, the course doesn’t really expect you to have to learn how to write shell scripts. To avoid this, all of the tests for this project will just be public tests, and those and style will make up your entire project grade. © 2023 L. Herman; all rights reserved 7 B Project–specific requirements, suggestions, and other notes • Before starting to code, make sure you understand the lecture examples of process control. If you have questions about them, ask about them in the TAs’ office hours in advance. • Of course the program could be written without explicitly using process control (e.g., by using the C library function system()). But since the entire point of the project is to write a program using process control, you will not get any credit for the project if you use system() at all anywhere. You must write the project as described. Also do not use a loop to try to make a process wait for another one to do something. (This is called busy–waiting and it is very inefficient, compared to correct ways to accomplish this.) • If any system call fails your program should print some sort of descriptive message saying what didn’t work right and where. It doesn’t matter what happens after that. The exact wording of the message doesn’t matter and it doesn’t matter what your code does if this occurs, as long as an explanatory message is printed. • Any function that allocates memory should test whether the memory allocation was successful, and handle the situation appropriately if it was not successful. (The appropriate action might just be gracefully exiting the program after printing an error message, but it at least should not be just crashing). • As mentioned above, do not use our memory checking functions for this project. Besides being incompatible with some of the tests that use valgrind, when your code is compiled on the submit server it will not be linked with memory-functions.o, so your program will not even compile on the submit server if you try to use our memory functions. • If you make changes to yargs.c, don’t forget to recompile it to create a new yargs.x before rerunning the public test shell scripts! • You cannot run a shell script using gdb (or gede), or valgrind. If you want to debug, run yargs.x under gdb (compiling of course with -g), then look at the shell script for the test that you want to run the program for, to see what command–line arguments the public test script is running yargs.x with, and what it is using for its input. Then run your yargs.x under gdb with the same arguments and input. And similarly, if you want to run valgrind, look at the shell script that is the test you want to run it on, and run it on the command given in the script. You can run a program under gdb with command–line arguments and input redirection. For example (suppose this is after running gdb yargs.x and setting some breakpoints), say we want to run yargs.x with the argument -gcc -Wall, reading input from the file datafile2. The following command to your running gdb will do this: run gcc -Wall < datafile2 You can also run valgrind on a program with command–line arguments and using input redirection: valgrind yargs.x gcc -Wall < datafile2 Some tests will run yargs.x with its standard input coming from a pipe, for instance, an invocation like the example ls *.c | yargs.x touch. If you want to debug your program in a case like this we recommend redirecting the output of the commands or commands before yargs.x into a file, then running gdb on yargs.x with input redirected from that file. Just be sure to use a backslash to disable aliases for any commands that are being used to generate input for yargs.x (the way the public tests are written takes care of this). In this example, run ls *.c > tmpfile, then run gdb yargs.x, and (as mentioned above) at the gdb prompt run < tmpfile. Comments in the public tests will explain what commands to use if you want to run the debugger on your yargs.x for that test. • If you are debugging a program that uses process control to create child processes, note that gdb will by default continue to trace the execution of the parent process after a fork. However, if you use the gdb command set follow-fork-mode child before the program creates a child, gdb will trace the child instead. (If needed, set follow-fork-mode parent will switch back to tracing the parent when subsequent child processes are created.) If your yargs.x program ends up running child processes with the wrong command–line arguments nothing is going to work right, and it might be difficult to know why. So if things aren’t working right and you’re not sure why, set a breakpoint in gdb right before a child process is created, use follow-fork-mode child, set a breakpoint in the child process right before it runs another program, and use gdb to view the arguments at that point that the other program is about to be run with. If they’re not right, that is the problem. © 2023 L. Herman; all rights reserved 8 • Do not write code using loops (or recursion) that has the same effect as any string library functions. If you need to perform an operation on strings and there is a string library function already written that accomplishes that task, you are expected to use it, otherwise you will lose credit. • You will lose credit if you cast the return value of the memory allocation functions. Besides being completely unnecessary, in some cases this can mask certain errors in code. • You cannot modify anything in the header files safe-fork.h or split.h or add anything to them, because your code will be compiled on the submit server using our version of these files. Your code may not comprise any source (.c) files other than yargs.c, so all your code must be in that file. You cannot write any header files of your own either. • For this project you will lose one point from your final project score for every submission that you make in excess of five submissions, for any reason. • Make sure none of your program lines have length more than 80 by running your Project #2 line length check programs on yargs.c! • Be sure to make frequent backups of your yargs.c source file in a different directory in your course disk space. • Recall that all your projects must work on at least half of the public tests (by the end of the semester) in order for you to be eligible to pass the course. See the project policies handout for full details. • Recent projects said that if you have a problem with your code and have to come to the TAs’ office hours for debugging help you must come with tests you have written yourself that illustrate the problem, not the public tests. However, since you aren’t expected to have to write shell scripts in this project, the TAs will help find bugs with the public tests in this project, even if you have not written your own tests. C Academic integrity Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, publicly providing others access to your project code online, or unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to projects. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus– please review it now. The academic integrity requirements also apply to any test data for projects, which must be your own original work. Exchanging test data or working together to write test cases is also prohibited. © 2023 L. Herman; all rights reserved 9
1 Introduction and purpose In this project you will extend upon your graph implementation from Project #7 in two ways: new functions will be added that allow removing components of graphs as well as an entire graph; and the graph functions will have to free any allocated memory when it is no longer needed. Some tests in this project will check for memory leaks and memory errors resulting in heap corruption (making parts of the heap invalid); Section 4 explains how these tests do this. Your functions in Project #7 could have passed some (or even all) of their tests despite having errors of these types, but in this project you will have to fix this. The purpose of the project is to ensure that you are not only able to create a more complex dynamically allocated data structure, but also able to correctly free memory for such a data structure. You must again have a makefile (still named Makefile) in your submission that will compile your code for all of the public tests. Other than two points mentioned in Appendix B below, the requirements for your makefile are the same as in the previous project, so they are not repeated here. See Project #7 and the notes in Appendix B about makefile dependencies. Note that Project #7 explained in detail how to test your makefile before submitting. Due to the size of the course it is not possible for us to provide project information or help via email/ELMS messages, so we will be unable to answer such questions. But you are welcome to ask them verbally during the TAs’ office hours. 1.1 Fixing problems in your Project #7 code Even if you had trouble with Project #7 and may not get a high score on it, you can still do fine in the course (one project just isn’t worth much toward your grade). But you do have to get Project #7 to work though, so you can reuse it and expand upon it in this project. So if you know that you have bugs in your Project #7 code, fix them right away. (Hint– valgrind is your friend; see the rest of this section.) A few days after this project is assigned we will provide the Project #7 secret tests in ~/216public/project07. You can use them to figure out changes needed in your Project#7 code when reusing it in this project. But even if you aren’t aware of any bugs in your Project #7 code your functions might still, as mentioned above, have some errors in the heap, despite being able to pass the Project #7 tests. But in this case your code will fail many of the tests in this project (the ones that are checking for consistency and correctness of the heap). So it is strongly recommended, before even trying to write the new functions in this project, that you first use valgrind to test for– and fix– any heap problems in your Project #7 code as follows: • Before even extracting the files from the project08.tgz tarfile, make a complete second copy of your entire project07 directory, with a different name. (This is just so you can make changes to your previous code while still keeping a backup of your Project #7 code as it was submitted.) • In the new copy of your project07 directory, add the -g compilation option to your makefile (if it’s not already in your compilation options), run make clean, then rebuild all of the public tests. • Now run valgrind –leak-check=no public01.x and look for any errors reported by valgrind. As the Project #7 assignment mentioned, the Project #7 tests will unavoidably have memory leaks, because there are no Project #7 functions that remove any components of a graph, so the tests have no way to free the memory used by String_graph variables when they finish. But the –leak-check=no option to valgrind prevents it from checking for memory leaks, so it will only check for other types of problems in the heap. valgrind was explained in discussion section and a number of small examples were shown (the tarfile with the examples is in ~/216public on Grace). The examples had different kinds of memory problems that could occur, and allowed you to see how valgrind identified the different kinds of problems. • Fix any memory problems in your string-graph.c that valgrind identified. If you need help understanding what valgrind is saying, after carefully reviewing and rerunning the valgrind examples in ~/216public, you can ask the TAs in office hours to help you understand what valgrind is telling you. • Repeat the same command for the rest of the public tests, fixing any problems reported for any of the later tests as well. © 2023 L. Herman; all rights reserved 1 • Then you can use your fixed string-graph.c in writing this project. Of course this procedure assumes that you were in discussion section when valgrind was demonstrated and explained by the TAs (on Monday, March 27), so you understand it. If you weren’t in discussion then, you’re on your own for figuring valgrind out. Try to ask a friend who was in discussion that day to explain it to you and show you the examples. 1.2 Extra credit and number of submissions You can again get extra credit for this project. If you make only one submission that passes all the public tests you will get 3 extra–credit bonus points. And if your single submission is made at least 48 hours before the on–time deadline you will get 3 additional extra credit bonus points, for 6 extra credit points total. (Obviously you can’t get the second 3 extra credit points if you don’t get the first 3, but you can get the first 3 without getting these second 3.) But as before, if you make too many submissions you may again lose credit. To avoid making submissions that don’t compile or don’t pass all the public tests you should compile the tests, run them, check your output, and fix any problems, all before submitting. And you should write your makefile first, and test it before submitting, as discussed below. (If for some reason your code passes all the public tests on Grace but doesn’t work when you submit it, so you have to submit more than once– and this is not due to an error on your part, a bug in your code, or a problem with your makefile– you can talk with me in my office hours about receiving extra credit despite having more than one submission.) 2 New functions to be written Just copy your string-graph.c and string-graph-datastructure.h files from your project07 directory (or copy the corrected versions from the new copy of your project07 directory if you followed the procedure in Section 1.1 above) to the new project08 directory that will be created by extracting the files from this project’s tarfile. As mentioned above, make whatever changes are needed to fix any bugs in your Project #7 code that was copied to the project08 directory, and add the new functions described below to it. But before starting to code, look at the public tests so you see how the functions are called. Write one function at a time (entirely or even partially), then test it (as soon as that is possible) thoroughly before going on! Write and test helper functions before writing code that uses them, and test them separately first as well. Project #7 said that if a pointer passed into a parameter of any function is NULL the function should have no effect, and (with two exceptions) if it returns an integer value it should return 0. This also applies (with one exception to the return value) to the new functions being added in this project 2.1 void free_vertex_name_list(char **const names) This is a helper or utility function that is not actually part of a graph. It should free all the dynamically–allocated memory of the strings in the array names, then free the array names itself. It should have no effect if the pointer names itself is NULL. If names is non–NULL the function may assume that its last element will always be NULL (i.e., the array will be like those returned by the function get_vertex_list(), and by the new function get_neighbor_names() explained below). So the user of your code can use this function to free the memory returned by get_vertex_list() and get_neighbor_names() when they are no longer needed, to avoid memory leaks. 2.2 char **get_neighbor_names(String_graph *const graph, const char vertex[]) If its parameter graph itself is NULL, or if the graph does not have a vertex with the name vertex, this function should just return NULL. Otherwise it should return a newly–created dynamically–allocated array, say new_arr, of pointers to characters, where the pointers point to copies of the names of the neighbors of the vertex vertex in the parameter graph. (Recall that a neighbor of the parameter vertex is a vertex, say other, such that there is an edge from the parameter vertex to other.) The names pointed to by the pointers in new_arr must be in increasing lexicographic (dictionary) order. Each name must be stored in a newly–created dynamically–allocated character array; that is, the pointers in new_arr should not just be aliases of the names stored somewhere in your graph data structure. © 2023 L. Herman; all rights reserved 2 If vertex has n neighbors, the array returned by the function must be an array with n+1 elements, where the last element must be NULL. (So if the vertex vertex has no neighbors the function must return a dynamically allocated array of one element, which will be NULL.) (The return value of this function for NULL parameters is the exception referred to in the third paragraph in Section 2.) 2.3 int remove_graph_edge(String_graph *const graph, const char source[], const char dest[]) This function should remove the edge going from source to dest in its parameter graph. If either source or dest are not vertices in the graph, or if they are vertices but there is no edge from source to dest, the function should just return 0 without changing anything, otherwise it should remove the edge and return 1. Any dynamically–allocated memory used by the edge must be freed. Recall that your data structure must contain at least one linked list or binary search tree somewhere. If, given your data structure, it’s necessary to remove elements from linked lists or binary search trees in order to remove graph edges, how to do these things was covered in CMSC 132. 2.4 int remove_vertex_from_graph(String_graph *const graph, const char vertex[]) If vertex is not a vertex in its parameter graph this function should just return 0 without changing anything, otherwise it should remove vertex from the graph and return 1. If a vertex is removed then all its outgoing as well as incoming edges must also be removed, because an edge needs both its source and destination vertices to exist. Any dynamically– allocated memory used by the vertex and its associated edges must be freed. 2.5 void graph_delete(String_graph *const graph) This function should deallocate all dynamically–allocated memory that is used in the String_graph variable that its parameter graph points to, releasing the graph and all its data in the process. So when this function returns, the graph data structure that graph previously pointed to effectively no longer exists in memory. See Section 3 for explanation about when this function can– and must– be called. 3 Valid sequences of calls to the functions A sequence of calls to the graph functions (including the ones from Project #7 as well as the new ones added in this project) on a given String_graph variable is valid if and only if the sequence satisfies the following: • graph_init() is the first call in the sequence. • Unless it is the last call in the sequence of calls to the graph functions, a call to graph_delete() must be followed immediately by a call to graph_init(). So following a graph_init() call and until the next graph_delete() call (if any), there can be any sequence of calls of functions other than graph_init() and graph_delete(). And it is invalid to call any other functions on a String_graph variable after graph_delete() is called on it, until graph_init() is called on it. The effect of an invalid sequence of calls is undefined. Ensuring that the sequence of calls of your function is valid is the responsibility of the caller of your functions– which includes your own tests– your code does not have to try to detect violations of these properties, and in fact it has no way to do so. If the user of your functions wants to avoid memory leaks they must make only valid sequences of calls to the graph functions, and they must also ensure that the last function call they make in any sequence of calls to the functions on a String_graph variable must be graph_delete(). They must also call free_vertex_name_list() exactly once on every pointer returned by the two functions get_vertex_list() and get_neighbor_names(), and after that they must not use the pointers again. (Note that a String_graph variable will still be valid even if this is not done.) 4 Memory checking and memory errors We have included an object file memory-functions.o in the project tarfile, which has two functions named setup_memory_checking() and check_heap() that are similar to the functions from Project #6. The header file © 2023 L. Herman; all rights reserved 3 memory-functions.h contains their prototypes. They have no parameters or return value. We use them as follows in some of our tests to detect memory leaks and corruption: • setup_memory_checking() must be called once in a program before any memory is dynamically allocated; it sets up things to check the consistency and correctness of the heap. If this function has been called initially, and the program has any memory errors later (such as overwriting beyond the end of an allocated memory region in the heap) the program will crash (consequently failing a test). • If there is any memory in use in the heap at all when check_heap() is called it prints a message and causes the program to immediately quit. If there is no memory in use in the heap at all then calling it just has no effect. Some tests will initially call setup_memory_checking(), then call check_heap() after functions are called on the different data structures and their memory is cleared. If the functions that are supposed to remove all or part of your graph data structure are not freeing memory properly, your program will report that there is some allocated memory in the heap– meaning a memory leak occurred- causing these tests to fail. (This could also happen if the user calls your functions in an invalid order, for example by calling graph_init() on a nonempty String_graph variable without calling graph_delete() on it first, but this is their fault. If you are using our memory checking functions in your own tests and you make an invalid sequence of calls to these functions, such tests would fail.) The facilities we use to perform this memory checking are not compatible with memcheck (valgrind), because memcheck uses its own versions of the memory management functions, that themselves use some memory in the heap. Consequently, if you run any tests that use our memory checking functions under valgrind, the test will appear to fail with a memory leak, even if your code does not have memory leaks. You can use either our memory checking functions with a program, or valgrind with it, but not both at the same time, or they will give incorrect results. Appendix B describes how to easily use valgrind in this project. The functions discussed in this section are for tests of your code to use. Do not try to call them from your graph functions– this won’t work. And since your code in string-graph.c will not be calling these functions, your string-graph.c should not include memory-functions.h. (Any of your own tests that want to check the correctness of the heap will include memory-functions.h, and will have to be linked with memory-functions.o.) A Development procedure review A.1 Obtaining the project files, compiling, checking your results, and submitting Log into the Grace machines and use commands similar to those from before: cd ~/216 tar -zxvf ~/216public/project08/project08.tgz This will create a directory project08 that contains the necessary files for the project, including the header files string-graph.h, compare-vertex-lists.h, and memory-functions.h, the object files compare-vertex-lists.o and memory-functions.o, and the public tests. As was discussed above, copy both of your string-graph.c and string-graph-datastructure.h files from your project07 directory (or copy the corrected versions from the new copy of your project07 directory if you followed the procedure in Section 1.1 above) to the project08 directory after extracting it from the project tarfile, so you can add the new functions in it. As in Project #7 you will not be compiling your code from the command line; you will use make and your makefile to compile everything. diff can be used as before to check whether your code passes a public test or not, for example: make public01.x public01.x | diff – public01.output (You can also run make from Emacs if desired.) Note that your makefile cannot run each test when building it, because this will probably not work on the submit server. If you want to create a separate target in your makefile that runs the tests this will not cause a problem on the submit server (although a better way to do this will be explained in discussion section soon). Running submit from the project directory will submit your project, but you first must make sure your makefile works and you have passed all the public tests, by compiling them– using your makefile– and running them yourself. Unless you have versions of all required functions that will at least compile, your program will fail to compile at all on the submit server. (Suggestion– create skeleton versions of all functions when starting to code, that just have an appropriate return statement.) © 2023 L. Herman; all rights reserved 4 A.2 Grading criteria Your grade for this project will be based on: public tests 40 points secret tests 45 points programming style 15 points Some secret tests may test your makefile. Section 4 of Project #7 explains the requirements for your makefile, so review that to ensure that you are satisfying them, in case they are tested by secret tests in this project. B Project–specific requirements, suggestions, and other notes • The data structure requirements are the same in this project as in Project #7. Be sure you have followed them– reread them in the Project #7 assignment. • Note that different public tests in this project use different things and include different header files, so different tests will have different dependencies in your makefile, and different files will have to be linked to create the executables for different tests. We emphasize again that you should not wait to write your makefile. Write it first– before you even start writing the graph functions. This will avoid having to type commands for compiling the public tests, but more importantly, using your makefile all during development, every time you compile your code, will allow you to ensure it works correctly before you submit. You can copy your makefile from your project07 directory to the directory for this project as a starting point, but the tests in this project will have different dependencies than they did in Project #7, and different files will have to be linked to create their executables than in Project #7. • Now that the preprocessor has been covered, and the conditional compilation directives that are actually always used in C header files have been explained in class, you must now have correct conditional compilation directives in your header file string-graph-datastructure.h. (Note that part of using conditional compilation correctly is that you must spell the name of the preprocessor symbol used for conditional compilation either as described in class and in the lecture slides, or in the form that the Reek text uses, or you may lose some credit.) • Since one header file in this project includes another one, your makefile must now use one of the approaches recently discussed in class for handling dependencies in this situation, otherwise it may not perform needed recompilations. (If your makefile doesn’t use one of these approaches you might lose a few points due to failing a secret test.) • As mentioned earlier, you cannot use valgrind together with our memory checking functions. We want to make it easy for you to use valgrind if you have memory problems that arise while writing this project, so we wrote the tests to allow you to easily turn off our memory checking, by just compiling with the preprocessor symbol ENABLE_VALGRIND defined. To be able to use valgrind (this assumes your Makefile is correct): – Add a definition of ENABLE_VALGRIND (e.g., -D ENABLE_VALGRIND) to the compiler options in your Makefile. (Recall you also have to add the -g option to be able to use valgrind, as well as gdb.) – Force all the tests to be recompiled using make clean (or touching all source/header files would also work), and recompile everything (our tests and yours). – Now you can use valgrind on any test. When you’ve found the problem and are done using valgrind, don’t forget to remove the definition of ENABLE_VALGRIND from the compilation options in your makefile and recompile everything again, otherwise you will not be running the same versions of the tests as the submit server is, because it will be using our memory checking for some tests. After re–enabling our memory checking by removing -D ENABLE_VALGRIND and recompiling, rerun the tests to confirm that they pass. © 2023 L. Herman; all rights reserved 5 – When running valgrind, add the option –leak-check=no to turn off checking for memory leaks for any test that is not calling graph_delete() on any String_graph variables when they are no longer needed and is also not calling free_vertex_name_list() on any pointers returned by get_vertex_list() and and get_neighbor_names() when they are no longer needed. • Do not write code using loops (or recursion) that has the same effect as any string library functions. If you need to perform an operation on strings and there is a string library function already written that accomplishes that task, you are expected to use it, otherwise you will lose credit. • You will lose credit if you cast the return value of the memory allocation functions. Besides being completely unnecessary, in some cases this can mask certain errors in code. • For this project you will lose one point from your final project score for every submission that you make in excess of four submissions, for any reason. • You cannot modify anything in the header files string-graph.h, compare-vertex-lists.h, or memory-functions.h, or add anything to them, because your submission will be compiled on the submit server using our versions of these files. You cannot write any new header files of your own other than string-graph-datastructure.h. Your code may not comprise any source (.c) files other than string-graph.c, so all your code must be in that file. • Any helper functions that you write in string-graph.c, whose prototypes are not already in string-graph.h, will be “private”, so they should be declared static, and their prototypes should be placed at the top of string-graph.c, not in string-graph.h or string-graph-datastructure.h. • If you have a problem with your code and have to come to the TAs’ office hours for debugging help you must come with tests you have written yourself that illustrate the problem (not the public tests), what cases it occurs in, and what cases it doesn’t occur in. In particular you will need to show the smallest test you were able to write that illustrates the problem, so whatever the cause is can be narrowed down as much as possible before the TAs even start helping you. You must also have used the gdb debugger, explained in discussion section, and be prepared to show the TAs how you attempted to debug your program using it and what results you got. And besides having used gdb, you must have also run memcheck (valgrind) if your code has any bugs, and be able to understand the results it shows. • Recall that the course project policies handout on ELMS says that all your projects must work on at least half of the public tests (by the end of the semester) in order for you to be eligible to get a C− or better in the course. C Academic integrity Please carefully read the academic honesty section of the syllabus. Any evidence of impermissible cooperation on projects, use of disallowed materials or resources, publicly providing others access to your project code online, or unauthorized use of computer accounts, will be submitted to the Office of Student Conduct, which could result in an XF for the course, or suspension or expulsion from the University. Be sure you understand what you are and what you are not permitted to do in regards to academic integrity when it comes to projects. These policies apply to all students, and the Student Honor Council does not consider lack of knowledge of the policies to be a defense for violating them. More information is in the course syllabus – please review it now. The academic integrity requirements also apply to any test data for projects, which must be your own original work. Exchanging test data or working together to write test cases is also prohibited. © 2023 L. Herman; all rights reserved 6