We choose different nodes within the same batch for the simple reason that it gives slightly better results. This is the rationale for virtually any choice in a neural network. As to why2020欧洲杯时间表 it gives better results ...
If you have many batches per epoch, you will notice very little difference, if any. The reason we have any dropout is that models sometimes exhibit superstitious learning (in the statistical sense): if an early batch of training examples, if several of them just happen to have a particular strong correlation of some sort, then the model will learn that correlation early on (primacy effect), and will take a long time to un-learn it. For instance, students doing the canonical dogs-vs-cats exercise will often use their own data set. Some students find that the model learns to identify anything on a soft chair as a cat, and anything on a lawn as a dog -- because those are common places to photograph each type of pet.
2020欧洲杯时间表Now, imagine that your shuffling algorithm brings up several such photos in the first three batches. Your model will learn this correlation. It will take quite a few counter-examples (cat in the yard, dog on the furniture) to counteract the original assumption. Dropout disables one or another of the "learned" nodes, allowing others to be more strongly trained.
The broad concept is that a valid2020欧洲杯时间表 correlation will be re-learned easily; an invalid one is likely to disappear. This is the same concept as in repeating other experiments. For instance, if a particular experiment shows significance at p < .05 (a typical standard of scientific acceptance), then there is no more than a 5% chance that the correlation could have happened by chance, rather than the functional connection we hope to find. This is considered to be confident enough to move forward.
However, it's not certain enough to make large, sweeping changes in thousands of lives -- say, with a new drug. In those cases, we repeat the experiment enough times to achieve the social confidence desired. If we achieve the same result in a second, independent experiment, then instead of 1/20 chance of being mistaken, we have a 1/400 chance.
The same idea applies to training a model: dropout reduces the chance that we've learned something that isn't really true.
Back to the original question: why dropout based on each example, rather than on each iteration. Statistically, we do a little better if we drop out more frequently, but for shorter spans. If we drop the same nodes for several iterations, that slightly increase2020欧洲杯时间表 the chance that the active nodes will make a mistake while the few nodes were inactive.
Do note that this is mostly ex post facto rationale: this wasn't predicted in advance; instead, the explanation comes only after the false-training effect was discovered, a solution found, and the mechanics of the solution were studied.