
                          Mutation Engine Report (2)


            Copyright (c) 1992 by  VDS Advanced Research Group
                           All Rights Reserved

                             P.O. Box 9393
                       Baltimore, MD 21228, U.S.A.
                            (410) 247-7117

                              prepared by

                             Tarkan Yetiser


      This report is provided to satisfy the curiosity of the
      public. We were approached by some third parties to perform an
      analysis on MtE. We would like to share the results of our
      analysis with everyone. If you find an error or inaccuracy in
      this report, please feel free to contact us. All constructive
      criticism is welcome. We thank all those who took the time to
      read and bring inaccurate or ambiguous parts of this report
      to our attention.


                               TABLE of CONTENTS

              I.   Mutation Engine and Viruses
              II.  How to Catch Viruses and MtE-based Viruses
              III. Mutation Types and Detection Algorithms
                      A. Sample Decryptor Disassembly
              IV.  Live Tests and Results
                      A. Comments on Test Results
              V.   A Simple Message

------------------------------------------------------------------------

                  I. Mutation Engine and Viruses

      We have analyzed the so-called MtE (Mutation Engine by a "Dark
Avenger" from Bulgaria), and sample viruses based on it; namely,
Pogue and Dedicated. We have also conducted tests to examine what
kind of a potential this miscreant has, and collected empirical
data on how popular scanners deal with the MtE. We have also implemented a
little program (CatchMtE) that can recognize MtE-based code using an
algorithmic technique. The program in executable form is available free of
charge as a service to the public. Due to possible misuse, the source code
and a more detailed (at the opcode and bit-mask level) analysis with
decryptor samples and algorithms necessary to detect MtE will be made
available in a limited fashion. Under no circumstances, actual virus
samples will be provided; except the missed samples can be sent to known
anti-viral product developers who wish to enhance their programs.

      For those who are not familiar with the MtE, some preliminary
information will be presented first:

      MtE is NOT a virus per se, but an object module that can be
linked into a virus to give it polymorphic capabilities. MtE
expects to be called as a routine that can encrypt a certain
portion of code and can generate a suitable decryption routine. It
uses a random number generator to vary each mutation so that it
will not be possible to recognize the new variant by using simple
scanning techniques. The random number generator is not part of the
MtE object module. A sample pseudo-random number generator is
included with the archive Dark Avenger distributes. A virus writer
could also supply his own random number generator.
      
      It's not trivial to take an existing virus and link it with MtE
and turn it into a polymorphic virus. The concern is that having
MtE available greatly reduces the effort needed to craft such a virus.
It's more like an "off-the-shelf" component that other virus writers
can use. Many of those virus writers would probably not be able to code
a polymorphic virus from scratch. Analysis of many viruses indicate
that their authors are not that skilled at all. It is this majority
that could benefit from something like MtE.
     
     Though all this may sound ordinary, MtE got so much attention
not because it is just another encryptive virus but because it can
provide even simple viruses with a feature that makes it difficult
to scan for them. MtE is just like a library routine that you link
into your virus and call when needed. It is a little over 2K in an
object module named MTE.OBJ. A person who calls himself "Dark
Avenger" claims to have developed MtE, and distributes it by
uploading to BBSes in Bulgaria. The archive contains a fairly
detailed documentation on how to use MtE, and even includes a
demonstration virus, a non-resident COM infector known as "Dedicated".
Shortly after MtE made its appearance, a modified copy of this virus
called "Fear" is also seen. We are informed that a few other MtE-based
viruses have been recently reported.
     
     Why this person is engaged in such potentially harmful activity,
or how he/she gets away with it is not something we know about. Curious
individuals who would like to learn more about the history of virus
production in Bulgaria and other social as well as technical issues
are invited to read an excellent paper written by anti-virus researcher
Mr. Vesselin Bontchev of Virus Testing Center, University of Hamburg.
The paper is titled "Bulgarian Virus Factory", and it is available via
anonymous FTP. It provides insight into some of the cultural aspects of
the virus underground in Bulgaria. Mr. Bontchev's contribution to
anti-virus research is much appreciated; otherwise, we probably would
have never known what goes on inside the Bulgarian virus factories.

            II. How to Catch Viruses and MtE-based Viruses

      Scanning for many  known viruses is usually a trivial task.
You disassemble a sample, extract a sequence of bytes that would
exist in each infected executable object, put it into a pattern
matching engine, and then look for that pattern in executable
objects that that virus is known to target. This method proved to
be quite useful in fighting many viruses seen in the wild. Assuming
a carefully chosen scan string, you can find the virus easily
without too many false positives. Not so for polymorphic viruses.
      
      These viruses try to defeat common scanning methods. They keep
their body encrypted to defy analysis, and encrypt the new copy
inserted into an executable object using a different key so that it
will "look" as if a different virus infection has occurred. However, 
even these viruses require a plaintext code that will decrypt the rest
of the virus. Scanners can use strings extracted from the plaintext
portion of the virus to identify them. It is usually necessary to
include wildcard bytes (don't-care bytes) to be able to deal with the
varying parts of the decryption routine. Naturally, false alarms are
more likely to occur. MtE-based viruses are more advanced than such 
viruses seen before.
      
      We would like to emphasize that the contents of each mutation
and the corresponding decryption routine MtE generates is far too
variable to extract a simple (or even wildcard) scan string. It is
necessary to analyze the MtE itself as well as many sample mutations.
After that, certain characteristics of the code MtE generates can be
used as telltale signs to detect its presence. Avoiding false positives
while maintaining 100% detection ratio is quite difficult.
      
      Armed with an 80x86 instruction set guide (we used Turbo
Assembler 3.0 Quick Reference Guide), and a good disassembler (we
used Mr. Zandt's DIS86 available via anonymous FTP), and a few
known viruses based on MtE (Pogue and Dedicated with payload
removed), we analyzed the MtE code, and the mutations generated.
Tests were conducted on a 40Mhz 386 with a 100 meg HD and MS-DOS
5.0, and a 4.77Mhz IBM/XT with a 30 meg HD and PC-DOS 3.3 installed.
A simple program that generated decoys (small, fully functional programs)
was used to create a large number of samples.
      
      In the case of Pogue, the virus was allowed to remain resident and
infect each decoy program as it is created. Since the Dedicated
virus is not resident, it was necessary to create decoys first and
then infect them by running the virus (infects in the current
directory). After the tests, we archived the samples and stored
them on floppy diskettes, and removed them from the hard drives of
the test machines.
      
      In the Intel 80x86 architecture, it is possible to express a
computation in very dissimilar ways. This is possible because
certain registers can be substituted in place of another one and
still achieve the same result. For example, you can index an array
by using SI, DI, BP or BX registers. Or you could XOR a certain
value at a given memory location by loading that value in AX, BX,
CX or DX first, and performing the XOR on that register, and then
putting the result back into memory, etc. Even other possibilities
exist. When stepping through elements in an array, you can
increment the index register by ADDing to it, INCing it, or ADDing
and then SUBtracting from it. It should be clear that such
flexibility helps MtE significantly. Of course, variability is
something string scanners do not handle too well, since there are
many combinations to search for.
      
      MtE goes even further than that. The size of the decryption
routine is also variable, making it infeasible to assume certain
things that would hold for many polymorphic viruses. It also sets
up a lengthy sequence of redundant instructions before the decryptor
enters the decryption loop.
      
      For over 90% of the mutations, MtE generates a convoluted
16-bit XOR-type encryption; however, in many cases it uses indirect
ways to apply the XOR mask to a memory value. For example, it
computes the mask, and then gets the value to be decrypted into a
register, applies the mask and puts the result back into that memory
location. Besides, memory access is done using many different
instructions such as MOV and XCHG. Sometimes, it uses a SUB and NEG
instruction sequence instead of XOR. There are also many redundant
instructions peppered freely throughout the decryptor.
      
      In some cases (5.5%), MtE generates a decryptor with a null
effect. The decryptor does not actually decrypt anything, and the
virus code is in plaintext. The frequency of such cases seems to
depend on the random number generator. It is funny to note that some
popular scanners misidentify such extreme cases where the virus
is not even encrypted. To handle these mutations, it is sufficient
to extract a signature from the MtE itself. It is also possible to
extract one from known MtE-based viruses and identify the virus
directly. At any rate, a scan string from MtE itself should be used
in case a future virus creates a plaintext variant.
      
      We must also mention that even these plaintext mutations
contained a fully working copy of MtE. They successfully propagated
and generated encrypted mutations in future generations. MtE appears
to generate correct code in all cases. The deviation between new
generations started using plaintext parents and new generations started
using encrypted parents was negligible.

                       III. Mutation Types and Detection Algorithms

      MtE generates 3 "types" of mutations. They are as follows:

      1.  Double-reference (detectable using Method-1) ( ~ 93.0% )
      2.  Single-reference (detectable using Method-2) ( ~  1.5% )
      3.  Plaintext or no-reference  ( ~  5.5% )

      By implementing two algorithms and one scan string for the
plain mutations, it is possible to recognize MtE-based viruses
while keeping false positives to an acceptable level. We have one
such program that achieved 100% hit rate during our tests. Some
others also claim 100% hit rate; and we have tested them as well.
      
      Double-reference mutations contain two memory accesses. The
first one usually (but not always) fetches a word to be decrypted.
The second one updates the same memory location after the word is
decrypted. We must mention that neither fetching nor updating is
always by simple MOVs; in fact, XCHG as well as SUB and NEG
instructions are used sometimes.
     
     Single-reference mutations contain only one memory reference.
The instruction that decrypts using one memory access is XOR. It
simply applies an XOR-mask, computed in a very convoluted way, to
a memory location. There are some irregularities in this type of
mutations. Popular scanners we tested had difficulty in dealing with
this type of samples.
     
     Our CatchMtE implements extra checks to reduce false alarms while
maintaining 100% hit rate on this type of mutations. Many self-modifying
programs (especially the compressed ones) trigger a false alarm, since
they may contain similar memory references while they are modifying
themselves. However, there is one characteristic of MtE that easily
gives it away and simplifies positive identification. Once you verify
the presence of that characteristic, you can achieve 100% hit rate even
if MtE is enhanced! We will not go into further detail here for security
reasons.

      In the case of plaintext mutations, there are two subcategories:

          1. Null-effect decryptor with no memory reference
          2. Null-effect decryptor with memory reference but with a 0-key

      The mutations in the first category contain nothing more than a
series of MOVs, ROLs, ANDs and so on. Their overall effect is null since
no reference to the code to be decrypted (in plaintext in this case) is
present. The body of the virus as well as the MtE itself are both in
plaintext and can be disassembled directly.
      
      The mutations in the second category are a little more interesting
in that the decryptor actually fetches from and stores words to memory. In
binary arithmetic, if you XOR a value with 0, that value does not change;
just like if you multiply a number by 1, you get the same number. Again,
the overall effect of the decryptor is null since it uses a 0-key. The
virus code as well as the MtE itself are both in plaintext.
      
      A more detailed analysis of mutation types is not made public
due to possible misuse of such information. We will only show here
a sample decryptor and its commented disassembly.

                  A. Sample Decryptor Disassembly

    Following is a sample decryption routine MtE generated during a live
Pogue test. The rest of the disassembly is omitted for security reasons.
This sample was missed by SCAN 91, and F-PROT 2.04; though F-PROT 2.04a
detects it. Following code is NOT a virus.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Virus      : Pogue mutation, encrypted, resident COM infector, 21h,1Ch
; Analyst    : -
; Date       : June 1992
; Type       : Single-reference
; File name  : PMUT1220.COM, in PDF
; File size  : 3608
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                ***** START of DISASSEMBLY *****
COM_PROG_ENTRY:

; In determining the program entry point, it is necessary to use a
; "sliding-window" approach since the first instruction is not a JMP
; or CALL. Second byte is a JMP (e9) and the offset follows it.

0000:0100      4d                  dec            bp
0000:0101      e9 fc 01            jmp            DECRYPTOR_ENTRY (0300)

; junk deleted

; In this mutation, BX register is used to step thru the code word at
; a time. Note the way XOR-mask is computed on the first iteration,
; DI will be 7DCCh the first time, and it will change as BX is incremented.
; We are looking at a piece of code that uses BASED-addressing.
; Since the XOR instruction at 0000:030c includes a constant offset 0f18h,
; we must add that value to the initial value of BX, which is set to f420h
; before the loop. Now if you add 0f18h and f420h, you will get 0338h! Why?
; Because BX is a 16-bit register, and it will wrap around after 64K with
; the carry flag set during the process. CF is not important here.
; In other words, the block that is being decrypted is at 0000:0338h.
; The constant 0dee is XORed with the current value of index, and another
; constant 7c02 is subtracted to obtain the final XOR-mask to be applied
; to the word at 0f18+BX.

DECRYPTOR_ENTRY:

0000:0300      bb 20 f4            mov            bx,f420

DEC_LOOP_TOP:

0000:0303      bf ee 0d            mov            di,0dee
0000:0306      33 fb               xor            di,bx
0000:0308      81 ef 02 7c         sub            di,7c02
0000:030c      31 bf 18 0f         xor            [0f18+bx],di

; Following is an example of redundant instructions peppered throughout the
; decryptors. After these two instructions, BX will not change, and SI
; is not used anyway. In a non-virus program, you would not often see such a
; sequence.

0000:0310      8b f3               mov            si,bx
0000:0312      8b de               mov            bx,si

; Now BX will be incremented by 2 to step word at a time
; This is the simple case; in some mutations, the two INC instructions
; are not used at all.

0000:0314      43                  inc            bx
0000:0315      43                  inc            bx

0000:0316      75 eb               jnz            DEC_LOOP_TOP (0303)

; Now how do we calculate the number of iterations? Not too difficult.
; Above you see an implicit terminating condition for the loop. Again,
; it uses the fact that BX is a 16-bit register, and when it hits 64K
; it will wrap around to 0, in which case the loop will be done.
; Since the initial BX is F420h, and we are stepping 2 bytes at a time,
; the number of iterations is: (64K - F420h + 1) / 2 = 1520 in decimal
; In other words, the size of the block being decrypted is 3040 bytes.
; Let's check to see if this makes sense numerically.
; The file size on this sample was 3608 bytes. Adjusting for 100h ORG,
; we can see that 0218h bytes are above the loop end.
;
; 3608 - 536 (0218h) = 3072 bytes from loop end to end of file
; 3608 - 568 (0238h) = 3040 bytes in encrypted block
;
; There is a discrepancy of 32 bytes? Not really, they are padding bytes
; as documented by Dark Avenger in MtE! 32 (MAX_ADD_LEN) bytes, as it says.
; This is only one of the ways it achieves variable length.
; Following are the padding bytes, which are all valid instructions!

0000:0318      bd 22 b9            mov            bp,b922
0000:031b      b1 03               mov            cl,03
0000:031d      d3 c5               rol            bp,cl
0000:031f      8b fd               mov            di,bp
0000:0321      bd ae 4f            mov            bp,4fae
0000:0324      81 e5 25 e7         and            bp,e725
0000:0328      8b dd               mov            bx,bp
0000:032a      8b ee               mov            bp,si
0000:032c      81 f5 60 fb         xor            bp,fb60
0000:0330      8a cb               mov            cl,bl
0000:0332      d3 c5               rol            bp,cl
0000:0334      8b cf               mov            cx,di
0000:0336      d3 c5               rol            bp,cl

; from this point on, the code is encrypted
0000:0338
                ***** END of DISASSEMBLY *****
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
                  IV.   Live Tests and Results

      Test #1     Base Virus Name: Dedicated

                             SCAN 91     F-PROT 2.04       CatchMTE 1.0

      by Name (1)             67                69          60
      as MtE  (2)             933               931         940
      misidentified           -0                -0          N/A
      missed                  -0                -0          -0
      Hit Rate                100%              100%        100%

(1)  SCAN91 --> [Mut],  F-PROT 2.04 --> Dedicated, CatchMTE --> Dedicated
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,       CatchMTE --> MtE-based

      Test #2     Base Virus Name: Pogue

                             SCAN 91     F-PROT 2.04       CatchMTE 1.0

      by Name (1)             0               0                 56
      as MtE  (2)             935           936                944
      misidentified (3)       -65           -61                N/A
      missed                  -0             -3                 -0
      Hit Rate                93.5%          93.6%             100%

(1)  SCAN91 --> N/A,  F-PROT 2.04 --> N/A,    CatchMTE --> Pogue
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,  CatchMTE --> MtE-based
(3)  SCAN91 --> [7S], F-PROT 2.04 --> Gotcha, CatchMTE --> N/A

      Test #3     Base Virus Name: Pogue

                             SCAN 91     F-PROT 2.04a       CatchMTE 1.0

      by Name (1)             0               0                141
      as MtE  (2)            2399          2398               2259
      misidentified (3)        ?              ?                N/A
      missed                  -1             -2                 -0
      Hit Rate              99.96%       99.92%               100%

(1)  SCAN91 --> N/A,  F-PROT 2.04 --> N/A,    CatchMTE --> Pogue
(2)  SCAN91 --> [DAME], F-PROT 2.04 --> MtE,  CatchMTE --> MtE-based
(3)  Not counted

                      A.    Comments on Test Results

      The amount of time it took each program to complete the tests
are not published not to distract from the main purpose of these
tests: to determine if they can achieve 100% hit rate.
      
      It seems that both F-PROT 2.04 and SCAN 91 misidentify some
Pogue mutations that are in plaintext. F-PROT "quickscan" missed
ALL mutations. You are advised to use SECURE scan mode of this
product. The extra speed comes with 0% hit rate on MtE-based viruses!
      
      In test #2, F-PROT 2.04 missed three encrypted Pogue mutations.
We examined these samples and found them to be of Single-reference type,
and detectable using Method-2. The samples worked as expected. We can
only speculate that F-PROT lacks Method-2 detection algorithm and uses
a heuristic in such cases.
      
      One of those three that were missed was called "suspicious" and
guessed to be a variant of the Gotcha virus.  Surprisingly, Virx 2.3
missed one of these same mutations. We did not include Virx 2.3 in our
full test suite.
      
      Further testing was done with F-PROT 2.04a. This new release
caught all the ones that the previous one missed in test #2; but still
some others, of single-reference type again, were missed in test #3.
SCAN 91 also missed 1 mutation completely. This mutation was NOT one
of the two that F-PROT 2.04a missed; they were identified as [DAME].
      
      It should be noted that misidentification of 6% of Pogue
mutations is a little alarming. All these misidentified mutations
were found to be working and capable of generating new mutations.
Some people may take an MtE-based infestation more seriously than an
infection by "Seventh Son" virus. This leaves a chance for an unrecognized
mutation to escape detection. We must remember that it takes only one
virus to continue the infection unless other precautions such as
integrity checking are in place.
      
      Someone brought to our attention that TBSCAN 3.3 can use MtE AVR
module in VIRSCAN.DAT (revision 920617) to detect MtE-based viruses. A
copy was downloaded from an FTP-site, and it was tested against the
sample set in test #3. It detected no viruses at all!
      
      One of our researchers mentioned that this product is highly
optimized, and that it may not be checking for MtE if the file size
is less than a certain number of bytes. Samples in test #3 were of
varying sizes: as small as 3500 bytes and never larger than 4000 bytes.
When we tested this scanner with a few larger samples (11K), TBSCAN 3.3
seemed to be able to detect MtE. We did not conduct comprehensive tests
using TBSCAN.
      
      This reminded us legendary algorist and computer scientist
Dr. Donald Knuth's famous saying: "Premature optimization is the root
of much programming evil". Improving the performance of one's scanner
is always desirable, as long as one does not try to "cut corners" and
sacrifice reliability. F-PROT quickscan also missed all MtE mutations,
but at least F-PROT has a secure mode that catches most of them. TBSCAN
did not have such a mode, in fact, it has an even faster mode that missed
the samples faster!
      
      It is not acceptable for a scanner not to find viruses that it
has scan strings for (or algorithms in this case). The primary function
of a virus scanner is to find viruses known to that scanner. If it performs
this task in a speedy manner, that's nice indeed; and it may even give it
a competitive edge. Failing to perform its primary function, on the other
hand, defeats the purpose of such a tool.

                    V. A Simple Message

      It is dangerous to assume that scanning is adequate since
there are some products that can detect MtE-based viruses 100% of
the time. We identified at least two ways to make MtE less
predictable. Of course, such information will not be disseminated.
However, considering the availability of MtE to the hackers all
around the world, and the "glory" Dark Avenger will enjoy due to
media hype, it's only a matter of time such improvements will be
discovered by irresponsible individuals. Besides, this may start a
new trend among virus writers to create things like MtE. Keeping up
with new virus signatures was hard enough (though manageable), but
keeping up with many mutation engines is not going to be trivial.
Unfortunately, locking up these "mutant engineers" is not a practical
solution, and not even legally possible in many parts of the world.
      
      The message is clear. The first line of defense against
viruses is NOT using scanners. Although they proved to be very
useful, you are highly encouraged to consider other approaches such
as integrity checkers as a first line of defense. Even the less
sophisticated integrity checkers have a better chance to catch
mutating viruses, long before their developers get a chance to
analyze the virus samples. The reason is that viruses have a tendency
to modify existing code to propagate in most cases. Their spread
can be controlled using a non-virus-specific solution that
concentrates on the main characteristic of most viruses. Such an
approach is not only more cost-effective but also more secure. If your 
company still relies only on a virus scanner to protect its PC-based 
computing resources against viruses, you are walking on thin ice.

