From OSTEP,
"Let’s start out with a linear page table. As you might recall1 , linear page tables get pretty big. Assume again a 32-bit address space ($2^{32}$ bytes), with 4 KB ($2^{12}$ byte) pages and a 4-byte page-table entry. An address space thus has roughly one million virtual pages in it ( $2^{32}/ 2^{12}$ ); multiply by the page-table entry size and you see that our page table is 4MB in size. Recall also: we usually have one page table for every process in the system! With a hundred active processes (not uncommon on a modern system), we will be allocating hundreds of megabytes of memory just for page tables!"
Continued:
"The basic idea behind a multi-level page table is simple. First, chop up the page table into page-sized units; then, if an entire page of page-table entries (PTEs) is invalid, don’t allocate that page of the page table at all. To track whether a page of the page table is valid (and if valid, where it is in memory), use a new structure, called the page directory. The page directory thus either can be used to tell you where a page of the page table is, or that the entire page of the page table contains no valid pages"