hero-background-image-header

The Best Books for Getting Started With Deep Learning

By Peter BellSeptember 24, 2019

If you want to get up to speed with deep learning, which books should you read? Consider starting with one or more of these three!

Image

While the idea of artificial neural networks was first proposed in the 1940s, and the first perceptron was built in 1958, the field of deep learning really started to take off over the last decade with substantial breakthroughs in successfully applying neural networks to computer vision, natural language processing, and a number of other common machine learning tasks.

But deep learning is a large and fast growing field. What’s the best way to get up to speed? No book will include all of the latest techniques in the field — to keep up with those you’re going to need to start reading academic papers. But before you do so, you’re going to have to obtain a grounding in the field to be able to keep up. Here are three books that will provide you with the context you’ll need to understand the latest papers related to deep learning.

Deep Learning” by Ian Goodfellow, Yoshua Bengio and Aaron Courville 

This is a comprehensive, but accessible introduction to the field. The Amazon page makes it clear just how influential this book is, with reviews from two renowned data scientists, Geoff Hinton and Yann LeCun, and even Elon Musk! This is perhaps the best known book on deep learning and, if you’re serious about becoming a data scientist, one you should really take the time to absorb.

After a brief introduction that provides some context, “Deep Learning” is broken down into three parts. Part 1 introduces just enough math to get you through the rest of the book — linear algebra, probability, and information theory, numerical computation and machine learning basics. If you have no background in the topics at all, you’re probably going to want to pick some more in-depth books on each topic to fill in any gaps in your understanding, but there is a good summary of each field with enough information to get you through the deep learning content later in the book.

Part 2 is the heart of the book, taking you through common elements of deep learning systems. It starts off with “deep feedforward networks” (sometimes called multilayer perceptrons) which are at the heart of all deep learning systems. It then introduces regularization (avoiding overfitting to training data), optimization (techniques for efficiently training networks), convolutional networks (optimized for data with a known grid-like structure) and sequence modeling using recurrent and recursive nets (for sequential data).

Part 3 then introduces some key areas of deep learning research — from linear factor models and autoencoders to monte carlo methods and deep generative models.

This is not a lightweight book and it can be heavy reading because it’s focused entirely on theory (there is no source code in the book - it’s a conceptual introduction to the field). That said, it’s a great way to start to build an understanding of the key topics you’ll be dealing with once you do start to put fingers to keyboard and create models using Tensorflow or Pytorch. It’s also available for free online

Hands-On Machine Learning with Scikit-Learn & Tensorflow” by Aurélien Géron 

This book is very different to Goodfellow’s tome. In just over 500 pages, Géron introduces both classical machine learning and deep learning. Each section starts with code followed by the explanations you’ll need to understand what’s going on.

If you have a background in programming, you should definitely give this book a try. By page 77, in addition to providing an overview of the machine learning process, you’ll have implemented your first machine learning project from end to end! You will have experienced the entire data science workflow, from framing the problem and cleaning and visualizing the data, all the way through selecting and training a model and then fine tuning the model using a grid search and ensemble methods. You then get an introduction to classification, training models, support vector machines, decision trees, additional ensemble methods, and dimensionality reduction using Principal Component Analysis.

If you’re primarily interested in deep learning, the good stuff kicks off on page 229 when you set up TensorFlow and create your first graph. Again there is a practical, code-first approach with just enough theory sprinkled around the code so you know what’s going on. It covers key theoretical concepts for deep learning including Convolutional Neural Networks, Recurrent Neural Networks, Autoencoders, and even briefly touches on Reinforcement Learning. At the same time it also covers key practical concerns such as distributing TensorFlow processing across devices and servers (essential for timely training with most non-trivial data sets).

Be aware that there is a second edition that should be coming out in October 2019. Given the rate of change in deep learning since the book was first published just over two years ago, it might be worth waiting for the new edition.

Deep Learning With Python” by Francois Chollet

This book provides a hands on, condensed introduction to the field of machine learning that fits somewhere between the authority and depth of “Deep Learning” and the pragmatic conciseness of “Hands-On Machine Learning with Scikit-Learn & Tensorflow."

It starts by defining artificial intelligence, machine learning, and deep learning. From there, it provides a brief history of machine learning, showing how traditional machine learning models like decision trees relate to neural networks. It also provides just enough context on linear algebra and calculus using small code snippets to set the scene for the rest of the book.

From there, it introduces Keras — a sophisticated framework for quickly building neural networks, and uses the library to implement binary and multi-class classifiers and then a regression using the standard Boston Housing Price dataset.

It then runs you through an overview of some key concepts such as data pre-processing, feature engineering/learning, overfitting and underfitting and the workflow for machine learning problems. From there it runs through practical worked examples for solving computer vision and natural language processing problems, introducing concepts such as Recurrent Neural Networks (RNN’s) and even Long Short Term Memories (LSTM’s) along the way.

If you’re determined to build sophisticated models ASAP and are only willing to read one book, providing you’re already comfortable with programming in Python, this might just be the book you’ve been looking for. 

Pick your own adventure

If you just want to get some hands on-experience with machine learning, “Hands-On Machine Learning with Scikit-Learn & Tensorflow” is an excellent starting point. It’ll give you practice building both classical and deep learning solutions in just over 500 pages.

If you’re ready for a more in-depth deep learning experience, “Deep Learning with Python” is ideal. By using Keras rather than a lower level framework like TensorFlow, it allows you to build sophisticated models quickly. And because it’s only focused on deep learning, it introduces you to a wider range of concepts compared to Géron’s more general guide.

If you are serious about understanding deep learning, you should take the time to work through “Deep Learning.” It’ll provide you with a strong grounding, and then you should pick one of the other two books to start to apply the theory. And of course, each book has some nuggets the other doesn’t, so if you have the time and inclination, why not work through all three? It’ll repay the effort in your depth and breadth of understanding of the field.

Headshot of Peter Bell

Peter Bell

Head of Data Science

Peter is a veteran technologist, CTO, entrepreneur, and longtime educator, having taught digital literacy at Columbia and authored numerous programming books.

All Articles by Peter Bell