Google Dataset Search

This is quite cool. Google has released a search tool for finding datasets! https://toolbox.google.com/datasetsearch

You can for instance find world surface temperature data, real-time assessment of hybridization between wolves and dogs, lot’s of x-ray datasets or data from breast cancer screenings etc…

The data seems to come from a lot of research projects where they have used different machine learning techniques to analyse the data.

Now that we have a lot better means of using machine learning and we have easy access to a lot of related data and our compute power has increased dramatically it might be that we will see quite a few improvements to older research results. I welcome this initiative and believe that the world will become a better place due to us collectively solving the worlds many problems using AI.

 

Train Imagenet in 18 minutes

Jeremy Howard et al, at fast.ai has done what one might consider a huge breakthrough in regards to training deep learning models quickly.

They managed to train Imagenet in 18 minutes using publicly available resources that only cost them $40 to run!

this was their method:

  • fast.ai’s progressive resizing for classification, and rectangular image validation
  • NVIDIA’s NCCL with PyTorch’s all-reduce
  • Tencent’s weight decay tuning; a variant of Google Brain’s dynamic batch sizes, gradual learning rate warm-up (Goyal et al 2018, and Leslie Smith 2018).
  • ResNet-50 architecture
  • SGD with momentum.

http://www.fast.ai/2018/08/10/fastai-diu-imagenet/

Deep Drive Dataset available

The large dataset for teaching your algorithms to drive can be downloaded from http://bdd-data.berkeley.edu/.

It contains over 100,000 HD video sequences, that make up over a thousand hours of footage. The data contains over 100 000 annotated  images for object detection for bus, traffic light, traffic sign, person, bike, truck, motor, car, train, and rider. Alos segmentation, drivable area, lane markings etc.

I love how data is released to the public for the greater good.

The new Fast.ai 2 Videos available

The Fast and the Furious 2 of machine learning is now available for your pleasure.

http://course.fast.ai/part2.html

Fast.ai is the very best way to learn practical Deep Learning. Period.

The first iteration of course 1 and 2, used Keras  and the new versions use their own library built on top of PyTorch. Their new library is awesome and has a lot of useful best practice functions.

Harward Data Science Course 2015

If you are interested in learning more about Data Science, you can check out the course page for the CS109 Data Science Course at Harvard University.
Topics covered are among others:

  • Pandas
  • Python
  •  Web Scraping
  • Regular Expressions
  • Data Reshaping
  • Data Cleanup
  • Probability
  • Distributions
  • Frequentist Statistics
  • Bias and Regression
  • SVM, Decision Trees, Random Forests
  • Ensemble Methods
  • MapReduce
  • Spark
  • Bayes Theorem and Bayesian Methods
  • Interactive Visualization
  • Deep Networks