Berkley Deep Unsupervised Learning

CS294-158 Deep Unsupervised Learning Spring 2019

Lectures, papers and assignments for Berkeyls Deep Unsupervised Learning Course are now available here:

The course deals with 2 areas of deep learning, namely Deep Generative Models and Self-supervised Learning.

Topics are:

  • Generative adversarial networks
  • variational autoencoders
  • autoregressive models
  • flow models
  • energy based models
  • compression
  • self-supervised learning
  • semi-supervised learning.

The course is currently ongoing so not all lectures are available yet.

Deep Drive Dataset available

The large dataset for teaching your algorithms to drive can be downloaded from

It contains over 100,000 HD video sequences, that make up over a thousand hours of footage. The data contains over 100 000 annotated  images for object detection for bus, traffic light, traffic sign, person, bike, truck, motor, car, train, and rider. Alos segmentation, drivable area, lane markings etc.

I love how data is released to the public for the greater good.

The final Deep Learning Specialization course is now out

After a long wait, the final and much-anticipated course in the Coursera Deep Learning Specialization series taught by Andrew Ng, called Sequence Models, has now been released.

The first week will be about Recurrent Neural Networks, the second week will address Natural Language Processing & Word Embeddings and the final week will be about Sequence models & Attention mechanism. 2018 has been released, and it’s truly awesome

Last year the 2017 course of was amazing, which taught state of the art deep learning to coders. There are so many goodies in the blog post about the 2018 launch which is available now. This year they held the course using Pytorch instead of Keras and wrote their own library for speeding up development and were the first to add several implementations from papers to the library such as Learning Rate Finder (Smith 2015) and Stochastic Gradient Descent with Restarts (SGDR). With one line of code, you can also get the images that the classifier gets wrong.

17 of the 20 top participants in a kaggle competitors were students in the preview course.

I recommend reading the blog post and taking the course.

Why switched from Keras and Tensorflow to Pytorch and built their own Framework on top of it

In the new course they will be using pytorch instead of Tensorflow, and has built a framework on top of it to make it even easier to use than Keras.

Pytorch is a dynamic instead of static deep learning library and Jeremy Writes that nearly all of the top 10 Kaggle competition winners now have been using Pytorch.

In the part 2 of course the focus was to allow student so read and implement recent research papers, and pytorch made this easier due to its flexibility. It allowed them to try out things you could not do as easily with Tensorflow. It also makes it easier to understand what is going on in the algorithms as with Tensorflow, the computation becomes a black box once you send it to the GPU.

Most models trains faster on Pytorch than on Tensorflow and are easier to debug contributing to faster development iterations.

The reason they built a framework on top of Pytorch is that pytorch comes with less defaults than Keras. They want the course one to be accessible for students with little or no experience in Machine learning. Also they wanted to help avoid common pitfalls (such as not shuffling the data when needed to or vice versa) and get you going much faster, improving where Keras was lacking. They also built in many best practices that Keras was lacking. Jeremy writes that:

“We built models that are faster, more accurate, and more complex than those using Keras, yet were written with much less code.” – Jeremy Howard

The approach is to encapsulate all important data choices such as preprocessing, data augmentation, test/training/validation sets, multiclass/singleclass classification, regression and so on into Object-Oriented Classes.

“Suddenly, we were dramatically more productive, and made far less errors, because everything that could be automated, was automated.” – Jeremy Howard

Jeremy thinks that deep learning will see the same kind of library/framework explosion that front end developers have been used to during that last years. So the library you learn today will probably be obsolete in a year or two.

99,3% accuracy on dogs and cats with 3 lines of code is not bad:

Stanford CS231n 2017 – Convolutional Neural Networks for Visual Recognition

The video lectures for Stanfords very popular CS231n (Convolutional Neural Networks for Visual Recognition) that was held in Spring 2017 was released this month. (According to their twitter page, the cs231n website gets over 10 000 views per day. The reading material on their page is really good at explaining CNNs)

Here are the video lectures:


These are the assignments for the course:


Also Make sure to check out last years student reports.  note: one is about improving the state of the art of detecting the Higgs Boson.

Dark Knowledge – Gueffrey Hinton

Abstract: A simple way to improve classification performance is to average the predictions of a large ensemble of different classifiers. This is great for winning competitions but requires too much computation at test time for practical applications such as speech recognition. In a widely ignored paper in 2006, Caruana and his collaborators showed that the knowledge in the ensemble could be transferred to a single, efficient model by training the single model to mimic the log probabilities of the ensemble average. This technique works because most of the knowledge in the learned ensemble is in the relative probabilities of extremely improbable wrong answers. For example, the ensemble may give a BMW a probability of one in a billion of being a garbage truck but this is still far greater (in the log domain) than its probability of being a carrot. This “dark knowledge”, which is practically invisible in the class probabilities, defines a similarity metric over the classes that makes it much easier to learn a good classifier. I will describe a new variation of this technique called “distillation” and will show some surprising examples in which good classifiers over all of the classes can be learned from data in which some of the classes are entirely absent, provided the targets come from an ensemble that has been trained on all of the classes. I will also show how this technique can be used to improve a state-of-the-art acoustic model and will discuss its application to learning large sets of specialist models without overfitting. This is joint work with Oriol Vinyals and Jeff Dean.

Lecture notes: