It is true that any machine learning is meaningless without a well sized network and huge amount of data. But it is also true that we really do not need every thing right away. In fact, there is a lot that we need to do before we can fruitfully use all that we have.
It is very useful if we just start with a small setup - to get a feel of how things can move. The basic structure of the network can be discovered using a small amount of data. Doing this helps us come up with an initial stepping stone, with a relatively less amount of processing. With this in place, we can start adding more data and gradually enrich our model.
This has many advantages.
When we start with a small network and simpler model, we naturally avoid chances of over fitting. Moreover, when we train the model with a small subset of the whole data, with a small network, we have a good amount of dev and test sets. All this reduces the chances of over fitting.
Once we have an underfitting model, we have a baseline. With this baseline in place, we can then start with training the bigger model and the real data set. But now we have an advantage. With the baseline, we can be sure we have improved on the model rather than losing out to overfitting.
No model was trained in a single attempt. Many iterations are required to get a stable design that can be improved with further training. Any iteration can be very expensive. A lot of the finer aspects of such training requires a lot of data. But there are some grosser aspects that really do not need it. If we start with a small model and a subset of the data, we can quickly fix a lot of such aspects and then we can scale up to handle the finer aspects. We need fewer iterations with the whole data.