Tuesday, December 4, 2012

Manually Decoding Keylogged Data Files: One Man's Technique

Have you ever found yourself in a time-crunch when your boss needs your analysis yesterday?  I think it's safe to assume the vast majority of us have all been in that situation before.  Well, the other day was no different for me except that I was looking a custom packed keylogger while drinking my morning java before heading into work, hence my time-crunch.

As it turned out, the keylogger encoded it's output and I really wanted to see what it was keylogging, but I didn't have time to throw the binary into my debugger, unpack it, step through it until I found the key, then decode the output.  So, I ended up just "eyeballing" the output file until I found it the key.  I've also done this with keylogged data files I extracted from compromised hard drives when the actual keylogger was no longer present.  In view of this, I'd like to share this simple technique as it may come in handy for someone who is pressed for time or for someone who conducts behavioral malware analysis, but isn't proficient with debuggers or dissassemblers.

The first thing I do is run the binary dynamically to see what footprint it leaves and to see what network traffic it spawns.  I also look for the presence of keylogging.  To that end, I'll open notepad and start typing "trash talk" in case there's a keylogger, but I'll always successively hit the "return" key at least eight times which will create 16 bytes.  I'll also throw in a series of backspaces for good measure.  This way, if it is keylogging and if the output is obfuscated, I should be able to see those returns and backspaces.  For example, since the return key generates \x0D \x0A (carriage return, newline), I look for a series of double byte combinations in my hex editor.  So when I see something like 98 9F 98 9F 98 9F, I'll assume those might be my XOR'd return bytes.  To find the key, I XOR one byte with what I believe it should decode to so in this case I'll XOR \x98 with \x0D and that result represents my probable key.  I then XOR the entire keylogged data file with the probable key and if I can read the file, it's correct.  In the example below, the double byte pattern can be seen at offset 80 and the XOR key in this case was \xA2


 Now back the other morning I spoke of earlier.  The encoded keylog data wasn't as straight forward as the example above.  I did see a double byte pattern, \x88 \x85, but I quickly found that a single byte XOR key wasn't used.  I then figured it would be a rolling XOR key, but that wasn't it either.  After looking at it a bit further, I thought it might be a byte for byte substitution key, but as luck would it, I was out of time and had to get ready for my day job.  While I was getting ready however, it hit me.  It wasn't using a substitution key, it was using addition to encode its output.  So I finished getting ready, ran downstairs and subtracted \x0D from \x88 which gave me \x7B.  I then subtracted \x7B from \x85 and sure enought, I got \x0A.  So my keylogger turned out to add \x7B to every keylogged byte.  I then let python do the math for me and decoded the keylogged output file.  After that, I finished my coffee, logged off, and went to work.

Below is a the encoded and decoded keylog data which is from a bifrost variant.  The double byte patterns can been seen in multiple places, but they stand out at offsets 0, 81 and C2


1 comment:

  1. Hello. Thank you for your explanation. can you share out the python script? thank you.

    ReplyDelete