The reasoning is due to resource and timing issues. As a physical system, all the resources of the well defined (and disclosed) system is given over to being a prime client. In the past, a VM's clocks could drift, especially on systems that have many VMs running or are under heavy load. Ordinarily, this drift isn't an issue, but because the prime client is the time keeper for the entire run, the possibility of such time drift was considered unacceptable.
Keep in mind this rule is in place for public submissions. Any run where the prime client was run within a VM would be considered non-complaint as well as it would not be comparable to other VMmark 2 results. It might however work if you were trying to debug an environment ( I wouldn't recommend it) or for internal only studies.