1

2020欧洲杯时间表In Deep Learning course, Prof. Ng mentioned about how to implement .

Implementation by Prof.Ng:-

  1. First image tells the implementation of dropout and generating d3 matrix i.e. dropout matrix for third layer with shape (#nodes, #training examples).

  1. As per my understanding, D3 would look like this for a single iteration and keeps on changing with every iteration (here, I've taken 5 nodes and 10 training examples)

Query2020欧洲杯时间表: One thing I didn't get is why we need to drop different nodes for each training example. Instead, why we can't keep dropped nodes the same for all examples and again randomly drop in the next iteration. For example in the second image, 2nd training example is passing through 4 nodes while the first one is passing through all nodes. Why not the same nodes for all examples?

  • I have trouble parsing the English to determine exactly what you want. Can you give an example with a single layer, perhaps, or link to a published paper or picture that shows what you mean? I can see several possible meanings behind this, and want to make sure I answer the correct one for you. – Prune May 23 at 23:03
  • Please check the question now. :) – Abhishek Singla May 24 at 1:54
0

We choose different nodes within the same batch for the simple reason that it gives slightly better results. This is the rationale for virtually any choice in a neural network. As to why2020欧洲杯时间表 it gives better results ...

If you have many batches per epoch, you will notice very little difference, if any. The reason we have any dropout is that models sometimes exhibit superstitious learning (in the statistical sense): if an early batch of training examples, if several of them just happen to have a particular strong correlation of some sort, then the model will learn that correlation early on (primacy effect), and will take a long time to un-learn it. For instance, students doing the canonical dogs-vs-cats exercise will often use their own data set. Some students find that the model learns to identify anything on a soft chair as a cat, and anything on a lawn as a dog -- because those are common places to photograph each type of pet.

2020欧洲杯时间表Now, imagine that your shuffling algorithm brings up several such photos in the first three batches. Your model will learn this correlation. It will take quite a few counter-examples (cat in the yard, dog on the furniture) to counteract the original assumption. Dropout disables one or another of the "learned" nodes, allowing others to be more strongly trained.

The broad concept is that a valid2020欧洲杯时间表 correlation will be re-learned easily; an invalid one is likely to disappear. This is the same concept as in repeating other experiments. For instance, if a particular experiment shows significance at p < .05 (a typical standard of scientific acceptance), then there is no more than a 5% chance that the correlation could have happened by chance, rather than the functional connection we hope to find. This is considered to be confident enough to move forward.

However, it's not certain enough to make large, sweeping changes in thousands of lives -- say, with a new drug. In those cases, we repeat the experiment enough times to achieve the social confidence desired. If we achieve the same result in a second, independent experiment, then instead of 1/20 chance of being mistaken, we have a 1/400 chance.

The same idea applies to training a model: dropout reduces the chance that we've learned something that isn't really true.


Back to the original question: why dropout based on each example, rather than on each iteration. Statistically, we do a little better if we drop out more frequently, but for shorter spans. If we drop the same nodes for several iterations, that slightly increase2020欧洲杯时间表 the chance that the active nodes will make a mistake while the few nodes were inactive.

Do note that this is mostly ex post facto rationale: this wasn't predicted in advance; instead, the explanation comes only after the false-training effect was discovered, a solution found, and the mechanics of the solution were studied.

| improve this answer | |

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy2020欧洲杯时间表

Not the answer you're looking for? Browse other questions tagged or ask your own question.