Strata London 2012 was the first conference I have attended in my life. It was a good learning experience in the world of big data and inspirational for what people are doing with technology these days. The material of this conference was at a high level and was geared towards strategies however this conference would not be that beneficial to implementors because there were no deep dives or hardcore programming sessions. This worked for me because I needed a good overview of the current state and to find out direction for the future.
Why did I go?
I have always had an interest in data visualizations and how people use data effectively. I work for a technology company and this is one area where I thought we could get involved with. So with the help of my company I got my pass to go to the conference. Tip #1, arrive early so that you can get registered before the first keynote speech. I was lucky that the conference was in London so it was a short commute for me to get there.
Who was there?
Keynotes, keynotes everywhere. You can check out the speaker slides from the Strata website. Some of my favourite speeches were from the following:
- Jake Porway – Good Data, Good Values
- George Dyson – The first 5kb are the hardest
- John Graham-Cumming – The Great Railway Caper: Big Data in 1955
- Alexandra Deschamps-Sonsino – The Quiet Comfort of the Internet of Things
I actually got to speak to Jake Porway and George Dyson, which was pretty exciting because they were really nice and were open to have a chat about their work. I also met up with my former colleagues because they were presenting their work that they did in big data with the Formula 1.
Conferences are not cheap to run so there were a lot of sponsors there. There were lot of free t-shirts and information to get from them. Sometimes they were just selling their product but sometimes some of them had speaking sessions as well. Tim Barker from DataSift had a good talk about how to filter the noise of social media so that you can understand it.
What did I learn?
The biggest thing that I learnt was that there is no such thing as big data. Big data has always been around. Having massive buildings dedicated to computing power is nothing new. Computers back in the day filled a room, now that one computer is a bunch of servers. However what is changing is how we consume data and how we generate data. Data used to be something that would be processed in batches over night or over the weekend but now with the prevalent availability of mobile devices we are generating data constantly.
Data becomes big when the cost of deleting it costs more than keeping it
With people generating data constantly we now are beginning to expect the feedback of getting insight from our data immediately too. Companies like Narrative Science showed that with good data, they can create great new stories with in depth analysis.
Algorithms in programming is becoming cool again. Map reduce has gained some popularity with Hadoop. However processing data in realtime can also work with algorithms too. Some areas that algorithms are gaining traction is in the areas listed below.
Summarize: Set cardinality, Frequent items, Quantiles, Clustering, Distinct Values
Classify: Perceptron, Linear SVN, SGD, SVI
Use: Bandit Algo, Reinforcement learning
The Internet of Things was a concept that really inspired me. With 3D printing and DIY circuit boards with Arduino or Raspberry Pi, we can start creating useful applications that are connected to the Internet where the interface may or may not be with a screen. This is a space were I see a lot of growth from hobbyists because the process of prototyping is considerately less than it was just a few years ago.
Now that the conference is over, it is time for me to go over my notes and try to gather some insight and next directions. With my company I will try to get some people excited over the idea of the Internet of Things, this may be kicked off by getting a group organized to learn how to program with Arduino. In the world of “big data”, data storage strategies is one area where I think big companies will need a hand with in the next few years. With different business divisions you will be dealing with lots of data but at the same time lots of incompatible data in the traditional database sense so finding solutions to solve that will be key. All of this data is useless if people can’t make insights out of it so I believe that the web will be a powerful tool to disseminate data to a larger audience.
I leave this post with this final picture. When I first came to the keynote speech room I saw the podium and thought to myself that it would be really amazing if I could be a keynote speaker at a conference. What that topic will be? I have no clue at the moment but some small steps will be taken in the near future