How I messed Up when Training Neural Network for First Time

Machine learning for me was like a some mythical black box magic surrounded by a thick mist and small gnomes protecting it. Only most courageous ones with enough work and some mathemagic dust shall pass the protection and enter the land of knowledge.

I never shy away from challenges that support my growth and since I love technology, I started practicing Machine Learning. When I came to CERN for internship, very soon I tried Artificial Neural Networks – the best sounding algorithm in the world!

Let me share my first experience with Artificial Neural Networks.

Problem domain

CERN does a lot of computations. Maybe not the scale of Google or Facebook, but it is still biggest project I have ever worked on. Physicists submit their programs to perform necessary calculations. Their computational power is spread through more than 170 data centers and 42 countries. There are many (six orders of magnitude) jobs running at any given moment!

Those jobs leave some information: did they succeed or not, how much time it took, where it happened, how much memory it used and so on. By the time of writing this, there are almost 400 different features (pieces of information) and many millions of entries in ElasticSearch database. Thats a lot! And this is data that I work with. I am trying to find something in there that would increase efficiency of jobs. What it could be – I don’t know yet. I am still working on it. And I use (at least trying to) Machine Learning for that! 🙂

Strategy

I have been in a meeting with a cool guy, who works as data scientist at CERN. The way he speaks – he knows his thing. He probably eats differential equations for breakfast and still is hungry. 🙂 Anyways, he proposed an idea to use unsupervised machine learning algorithm called Self-Organizing Map (or SOM, as I will call it). After reading article on Wikipedia I immediately noticed that it is artificial neural network. I fell in love with this idea (the buzzword!), even I still have to understand what it actually is (I have basic idea).

So, SOM is very interesting algorithm. Basically, it is used to find a small representatives of whole data for you to show it. Visualizing hundreds of thousands or even millions entries seems like a daunting task for me. Add fact that it is in high-dimensional space (we live in three dimensions, by the way) and I say it is impossible. But with SOM, you choose size of the grid and, given some time, this grid becomes a pretty good representative of your whole data and you can start looking for some patterns and insights. Results are mostly shown as heat maps. Actually, I don’t know any other way, do you?

So I was very excited and thought that it will work like magic! I am even excited remembering how exciting it was. 😀 I wanted to train is as soon as possible and see what it has to offer me.

P.S. If you do not shy away from a little bit of mathematics, I found this video to give me most value about SOM.

Results

After waiting for three days, bragging for my parents and friends that I am doing something magical, I get these heat maps.

What the fuck? That was my first thought. These are scattered all over the place without showing anything. And the CPU Efficiency, it supposed to be value between 0 and 100, but cluster is showing 135.000!

Really, what the fuck? Glad that I have mentors who can helped me answer this question. 🙂

Problems

Problem #1: I didn’t check the data

So CPU Efficiency heat map was the most interesting since it showed that something is obviously wrong. I checked my data file and I saw what I should have seen earlier.

My data was not valid! There were some entries, that had CPU Efficiency metric higher than thousands! That made everything deviate far away from results I was looking for. I was trusting the data.

I learned a lesson that I shall never trust the data until I validate it or my team does so.

Problem #2: I treated discrete features as continuous

Since SOM works with numbers only, I had to convert strings to numbers. I made a map and assigned every possible value a unique number for that feature. For example, Tier 0 DC in CERN was treated 0, Tier 1 DC in Freiburg as 1 and so on.

My mentor raised a question: what is 0.5 data center? Is it halfway between CERN and Freiburg?

I tried to figure this out for some time. This mapping didn’t make sense at all since those are discrete things, not continuous. I haven’t thought about that at all before!

Lesson from this one is that I shall know how algorithm works and know what kind of data I have.

Closing thoughts

Machine learning is not as magical as it looks like. It is very logical and deductive. But I like it! Thinking this way removes that mysterious veil and provides opportunity of uncovering secrets it is hiding.

I love the idea of having this powerful tool in my arsenal. I am definitely continuing my learning these awesome things!

How you fucked up with your trainings? How did you feel that time? Or it was breeze? Let me know!