Browse assignments

Assignment catalog

33,401 assignments available

[SOLVED] Cs6264 assignment 7- project seven plus bonus 2025

5/5 - (1 vote) “…and we aim to show stronger improvements starting next fiscal quarter.” This all-hands call is taking forever, you think to yourself while daydreaming about where you should go for your holiday in two weeks. “…will be taking the lead with our next client. We expect great things from them.” Wait. Did your boss just say your name? “The client is an IDS vendor whose product uses machine learning models to identify malware. However, they have noticed that their models are frequently evaded and hope that we can find out why.” You don’t remember being told about this, but at this point, you guess you’re used to it. You’re just thankful your coworker was taking notes during the meeting and he gave you some tutorials on MLSploit, a framework you are expected to use. And later, you found some information about the attack. Assignment The purpose of this assignment is to gain experience with training machine learning (ML) and deep learning (DL) models classifying Windows portable executable (PE) malware into families. Specifically, the models will be given two different datasets: benign PE files and malicious PE files from multiple families. After training a DL models, you will attack those models using an evasion attack called the Mimicry Attack. Then, you will be tasked with improving the models which were attacked. Finally, you will train a ML model using different features and see if the mimicry attack still work. You will write a report about your experiences and observations. There are 5 tasks and a bonus task you will need to complete for this assignment. They include: Training DL Models (10%): Train LSTM, CNN, and RNN models on API call sequences. Attacking DL Models (10%): Attack models via mimicry attack. Detecting Attack (20%): Train model based on static features to detect the attack sample. Training ML Models (10%): Train classical ML models on API call existence, frequency, and arguments. Transferring Attack (10%): Run the mimicry attacks in controlled environment and evaluate the ML models. Attack ML Model using RL (10% bonus): Train ML model using ember and train RL model to evade the ML model. You will also need to compile a report (40%) that should contain screenshots of your findings and explanations for why the certain screenshot happened. For example, if your screenshot is comparing the results of how well different models detected the attack in Task 2, then an explanation for why the results differed should be included. To complete the tasks, you will also need these files Download these files. Supplementary Material: Lab 7_Supplementary_Material.pdf , Task 3 TemplateDownload Task 3 Template Deliverables Compress the deliverables for each task into a .tar.gz file called [GT Username]_cs6264_lab07.tar.gz with the following directory layout: task1/ pe.model.zip prediction.zip *.log.txt files task2/ attack-exe.zip: the attack samples generated from MLSploit to evade your models attack-feature.zip attack-prediction.zip attack.cfg.zip: Configuration file from MLSploit *.log.txt files task3.a/ detection1.py: Source code (preferably in Python) that will train a new model that will detect the attack from the previous task detection2.py detection3.py (others here if you wish) task3.b/ model1.zip model2.zip model3.zip (others here if you wish) task4/ pe.model.zip prediction.zip *.log.txt files task5/ task1 model/ prediction.zip *.log.txt files task4 model/ prediction.zip *.log.txt files report.pdf: This report should contain screenshots of your findings and explanations for why the certain screenshot happened. For example, if your screenshot is comparing the results of how well different models detected the attack in part 2, then an explanation for why the results differed should be included. bonus/ model.zip *.log.txt files ember-attack.zip *.log.txt files Warning: Warning: The malware binary we provide you (and the malware produced by MLSploit) is real malware. Do not under any circumstances execute these malware EVER. It is a compiled form of the rbot malware family and antivirus companies are well-aware of their existence (https://github.com/ytisf/theZooLinks to an external site.). We have not applied any static obfuscation to them so they should be easily detectable by AV companies. You are to use these binaries responsibly by only reading their byte contents (e.g., using tools like https://github.com/erocarrera/pefileLinks to an external site.). Lab 7: Supplementary Information CS 6264-OCY Overview Introduction Training Deep Learning Models Attacking Deep Learning Models Defending DL Models with ML Cat and Mouse Further Reading Introduction A single malware family may have several variants, making it hard to detect all of them comprehensively However, these malware variants will generally all have the same behaviour across different variants One way to detect an entire family of malware is to train a machine learning model based on the behaviour of a malware family so that if the model sees a new variant, it will still be able to classify the malware as from a specific strain This makes it not only faster but can allow you to scale your malware detection up than classifying each malware by hand Just like the pattern-recognition Host-Based IDS you created in lab 3, these models classify and identify syscall sequences (among other parameters) MLSploit MLSploit was developed by Georgia Tech PhD students with oversight from Georgia Tech Professors and Intel research scientists Helps you train and test different machine learning solutions against machine learning attacks Made with a simple GUI for usability See tutorials in lab files for more information on how you should operate MLSploit Training a DL Model To explore the world of machine learning security, we will first construct a few models to test later We want to test LSTM, CNN, and RNN models You can change the window size (i.e. length of the sequence of syscalls that the model will compare at each time) on the right size of the UI for more accurate models A full-length tutorial can be found in Canvas called Lab07_PE_Module_Tutorial.pdf Attacking a DL Model Now that our models are created, let’s see how easily they can be attacked (AKA tricked into thinking a malicious file is benign) In MLSploit, we will create a new pipeline that will first perform a “mimicry attack” against a benign application to figure out what a benign application might do to evade the machine learning model Next in the pipeline, we transform the benign application by injecting 10 different shellcode chunks into it, creating 10 samples that might evade the machine learning model by making it think that it was benign These steps are also detailed in the same tutorial Identifying Malware Meant to Trick DL with ML As we have learned in this course, combining both static and dynamic analysis makes for much more robust malware identification To incorporate static analysis into our process, we will need to train an ML model on static features of normal programs to identify differences in one that was injected with a mimicry attack We have provided static features of benign programs for you, so first we will extract features of the malicious programs you created in the previous step We will be using EMBER To extract features like EMBER, check out how PEFeatureExtractor is used by scripts/ember_init.py You may also find a fork of EMBER here with some useful scripts After you have static features extracted for both, let us move on to training a model to identify malware based on static features First, we must prepare the data for the ML algorithm Note that we are using a classification algorithm as this is a classification problem ○ We have some labels (malicious vs. benign) and we want to put them on a set of THREE features ○ First, we make a table with features and a label for eat set of features Next, we will split this table into the table of features and the table of labels ● You can do this with hsplit() or numpy.split(axis=1) x, y = numpy.split(dataset, [-1], axis=1) Now, you can create a training set with scikit-learn’s train_test_split() method This will output a training and testing array for both features and labels for the ML algorithm to use Now, we can finally train a model Initialize a classifier like the DecisionTreeClassifier Then, use the class’ fit() method to train your model To figure out the accuracy of your model, first use the model to predict labels for the test array of features, and then get the accuracy score using sci-kitlearn’s accuracy_score() method y_pred = dt.predict(x_test) print(accuracy_score(y_test, y_pred)) Cat and Mouse You can do something similar with MLSploit Follow the tutorial included with the project and create traditional ML models You can also try to run the same attack from before against these models Further Reading Useful Example Code Mlsploit-pe (has useful fork of EMBER) GitHub – evandowning/mlsploit-pe: MLSploit PE module Scikit-learn documentation API Reference — scikit-learn 0.24.1 documentation ML Evasion Mimicry attacks on host-based intrusion detection systems | Proceedings of the 9th ACM conference on Computer and communications security [1804.04637] EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models (arxiv.org) [1801.08917] Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning (arxiv.org)