October 29, 2017




















October 29, 2017




















October 18, 2017
The Google Brain team has just released a new paper (https://arxiv.org/abs/1710.05941) that demonstrates the superiority of a new activation function called Swish on a number of different neural network architectures.
This is interesting because people often ask me, “which activation function should I use?”
These days, it is common to just use the ReLU by default.
To refresh your memory, the ReLU looks like this:
And it is defined by the equation:
$$ f(x) = max(0, x) $$
One major problem with the ReLU is that its derivative is 0 for half the values of the input \( x \). Because we use “gradient descent” as our parameter update algorithm, if the gradient is 0 for a parameter, then that parameter will not be updated!
In other words, when I do:
$$ \theta = \theta – \alpha \frac{\partial J}{\partial \theta } $$
And:
$$ \frac{\partial J}{\partial \theta } = 0 $$
Then my update is just:
$$ \theta = \theta $$
Which just assigns the parameter back to itself.
This leads to the problem of “dead neurons”. Experiments have shown that neural networks trained with ReLUs can have up to 40% dead neurons!
There have been some proposed alternatives to this, such as the leaky ReLU, the ELU, and the SELU.
Interestingly, none of these have seemed to catch on and it’s still ReLU by default.
So how does the Swish activation function work?
The function itself is very simple:
$$ f(x) = x \sigma(x) $$
Where \( \sigma(x) \) is the usual sigmoid activation function.
$$ \sigma(x) = (1 + e^{x})^{1} $$
It looks like this:
What’s interesting about this is that unlike every other activation function, it is not monotonically increasing. Does it matter? It seems the answer is no!
The derivative looks like this:
One interesting thing we can do is reparameterize the Swish, in order to “stretch out” the sigmoid:
$$ f(x) = 2x \sigma(\beta x) $$
We can see that, if \( \beta = 0 \), then we get the identity activation \( f(x) = x \), and if \( \beta \rightarrow \infty \) then the sigmoid converges to the unit step and multiplying that by \( x \) gives us back \( f(x) = 2 max(0, x) \) which is just the ReLU multiplied by a constant factor.
So including \( \beta \) is a way for us to nonlinearly interpolate between identity and ReLU.
The title of the paper is “A SelfGated Activation Function”, which might make you wonder, “Why is it selfgated?”
This should remind you of the LSTM, where we have “gates” in the form of sigmoids that control how much of a vector gets passed on to the next stage, by multiplying it between the output of the sigmoid, which is a number between 0 and 1.
So “selfgated” means that the gate is just the sigmoid of the activation itself.
Gate: \( \sigma(x) \)
Value to pass through: \( x \)
But that’s enough theory. For most of us, we want to know: “Does it work?”
And more practically, “Can I just use this by default instead of the ReLU?”
The best thing to do is just to try it for yourself and see how robust it is to different settings of hyperparameters (learning rate, architecture, etc.) but let’s look at some results so we can be confident when it comes to using Swish:
Click on the image to see it in the original size.
To compare Swish with baseline, a statistical test called the onesided paired sign test was used.
Conclusion: Try Swish for yourself!
Go to commentsOctober 17, 2017
This is a short post to help those of you who need help translating code from Python 2 to Python 3.
Python 2 is the most popular Python version (at least at this time and certainly at the time my courses were created), hence why it was used.
It comes with Mac OS and Ubuntu preinstalled so when you type in “python” into your command line, you get Python 2.
This list is not exhaustive. It shows only code that appears commonly in my machine learning scripts, to assist the students taking my machine learning courses (https://deeplearningcourses.com).
Integer Division
OLD:
a / b
NEW:
a // b
For Loops
OLD:
for i in xrange
NEW:
for i in range
Printing
OLD:
print "hello world"
NEW:
print("hello world")
Dictionary iteration
OLD:
for k, v in d.iteritems():
NEW:
for k, v in d.items():
COMPATIBLE WITH BOTH:
from future.utils import iteritems for k, v in iteritems(d):
Go to comments
October 7, 2017
“Hey Lazy Programmer, when is your next course coming out?”
I’ve been really busy adding tons of free updates to my existing courses! You can scroll down to the very bottom to see what they are. But in the mean time we are going to do another HUGE sale. ALL courses on Udemy are now $12. Take this opportunity to grab as many courses as you can because you never know when the next sale is going to be!
As usual, I’m providing $12 coupons for all my courses in the links below. Please use these links and share them with your friends!
You can also just type in the coupon code “OCT123” (except Deep Learning pt 1 because I messed it up =), for that use “OCT123A”).
The promo goes until October 10. Don’t wait!
At the end of this post, I’m going to provide you with some additional links to get machine learning prerequisites (calculus, linear algebra, Python, etc…) for $12 too!
But that’s not all… I’m the Lazy Programmer, not just the Lazy Data Scientist – I’ve got $12 coupons for iOS development, Android development, Ruby on Rails, Python, Big Data / Hadoop / Spark, React.js, Angular, and MORE. All important skillsets on ANY engineering team. Got any friends or coworkers in mobile / backend / big data development? Let them know!
If you don’t know what order to take the courses in, please check here: https://deeplearningcourses.com/course_order
Here are the links for my courses:
Deep Learning Prerequisites: Linear Regression in Python
https://www.udemy.com/datasciencelinearregressioninpython/?couponCode=OCT123
Deep Learning Prerequisites: Logistic Regression in Python
https://www.udemy.com/datasciencelogisticregressioninpython/?couponCode=OCT123
Deep Learning in Python
https://www.udemy.com/datasciencedeeplearninginpython/?couponCode=OCT123A
Practical Deep Learning in Theano and TensorFlow
https://www.udemy.com/datasciencedeeplearningintheanotensorflow/?couponCode=OCT123
Deep Learning: Convolutional Neural Networks in Python
https://www.udemy.com/deeplearningconvolutionalneuralnetworkstheanotensorflow/?couponCode=OCT123
Unsupervised Deep Learning in Python
https://www.udemy.com/unsuperviseddeeplearninginpython/?couponCode=OCT123
Deep Learning: Recurrent Neural Networks in Python
https://www.udemy.com/deeplearningrecurrentneuralnetworksinpython/?couponCode=OCT123
Advanced Natural Language Processing: Deep Learning in Python
https://www.udemy.com/naturallanguageprocessingwithdeeplearninginpython/?couponCode=OCT123
Advanced AI: Deep Reinforcement Learning in Python
https://www.udemy.com/deepreinforcementlearninginpython/?couponCode=OCT123
Deep Learning: GANs and Variational Autoencoders
https://www.udemy.com/deeplearninggansandvariationalautoencoders/?couponCode=OCT123
Easy Natural Language Processing in Python
https://www.udemy.com/datasciencenaturallanguageprocessinginpython/?couponCode=OCT123
Cluster Analysis and Unsupervised Machine Learning in Python
https://www.udemy.com/clusteranalysisunsupervisedmachinelearningpython/?couponCode=OCT123
Unsupervised Machine Learning: Hidden Markov Models in Python
https://www.udemy.com/unsupervisedmachinelearninghiddenmarkovmodelsinpython/?couponCode=OCT123
Data Science: Supervised Machine Learning in Python
https://www.udemy.com/datasciencesupervisedmachinelearninginpython/?couponCode=OCT123
Bayesian Machine Learning in Python: A/B Testing
https://www.udemy.com/bayesianmachinelearninginpythonabtesting/?couponCode=OCT123
Ensemble Machine Learning in Python: Random Forest and AdaBoost
https://www.udemy.com/machinelearninginpythonrandomforestadaboost/?couponCode=OCT123
Artificial Intelligence: Reinforcement Learning in Python
https://www.udemy.com/artificialintelligencereinforcementlearninginpython/?couponCode=OCT123
SQL for Newbs and Marketers
https://www.udemy.com/sqlformarketersdataanalyticsdatasciencebigdata/?couponCode=OCT123
PREREQUISITE COURSE COUPONS
And just as important, $12 coupons for some helpful prerequisite courses. You NEED to know this stuff before you study machine learning:
General (sitewide): http://bit.ly/2oCY14Z
Python http://bit.ly/2pbXxXz
Calc 1 http://bit.ly/2okPUib
Calc 2 http://bit.ly/2oXnhpX
Calc 3 http://bit.ly/2pVU0gQ
Linalg 1 http://bit.ly/2oBBir1
Linalg 2 http://bit.ly/2q5SGEE
Probability (option 1) http://bit.ly/2prFQ7o
Probability (option 2) http://bit.ly/2p8kcC0
Probability (option 3) http://bit.ly/2oXa2pb
Probability (option 4) http://bit.ly/2oXbZSK
OTHER UDEMY COURSE COUPONS
As you know, I’m the “Lazy Programmer”, not just the “Lazy Data Scientist” – I love all kinds of programming!
And I’ve got sales for everything:
iOS courses:
https://lazyprogrammer.me/ios
Android courses:
https://lazyprogrammer.me/android
Ruby on Rails courses:
https://lazyprogrammer.me/rubyonrails
Python courses:
https://lazyprogrammer.me/python
Big Data (Spark + Hadoop) courses:
https://lazyprogrammer.me/bigdatahadoopsparksql
Javascript, ReactJS, AngularJS courses:
https://lazyprogrammer.me/js
EVEN MORE COOL STUFF
Into Yoga in your spare time? Photography? Painting? There are courses, and I’ve got coupons! If you find a course on Udemy that you’d like a coupon for, just let me know and I’ll hook you up!
Remember, these links will selfdestruct on October 10 (5 days). Act NOW!
COURSE UPDATES
Recent updates to existing courses because my students are awesome and deserve free stuff:
Deep Learning pt 2 (Theano / Tensorflow):
* Brand new section on batch normalization (7 new lectures!)
Deep Reinforcement Learning (Advanced AI):
* Continuous Mountain Car in Theano and Tensorflow with Policy Gradient
Cluster Analysis / Unsupervised ML:
* Simulating Biological Evolution + Applying Clustering
* Applying Clustering to Donald Trump + Hillary Clinton Tweets from 2016 Election
Numpy Stack:
* Improved quality / resolution of all slides
* Pushed code up so video player controls won’t block it
Unsupervised Deep Learning
* Visualizing tSNE