Currently I'm in the process of getting malware samples. I got two methods, since I'm doing an academic research, as of now NO consensus on the "official" malware datasets as research in Intrusion Detection System (IDS) enjoys. They have KDD "something" and MIT Lincoln 1999 datasets for comparison.
Fair enough. Well as for me - I just signed Non Disclosure Agreement (NDA) with CyberSecurityMalaysia(CSM) in order to get their sample. Actually I can get on my own using my honeypot, but since I don't want any dispute which regards to the sample that I have - to be safe - just use CSM sample for academic literatures.
I read ClamAV website, finding info about the core engine and signature of ClamAV.. I never spend quite amount of time to read on ClamAV before, but having ClamAV as the only free and open source software which thoroughly described about their architecture, it's a pleasure to do so (I mean, reading the docs).
Now, for my research proposal, I need to extract the malicious features (strings) from the malware. I could use "strings" command, or XOR parts of the encrypted malware - but I could get a better way to do so, since the other researchers had done that before. Need to email them if they could help. The last time I emailed one of them the person never replied, although he did replied prior to that. I'm not sure why, may be that is their "trade secret" or they don't want to discuss that in detail.