Individual Projects

The behind giving you individual projects is to challenge you with an open problem in big data. However, because you can propose your project, the hope is that your level of motivation is higher than in a project that I would give you, because most likely you choose something you are already interested in. In addition, your curiosity can lead you to places where projects that I design would not lead you.

The project proposals that you are writing right now will be evaluated by a couple of criteria:

  • feasibility – we estimate the time, effort, and computational resources required, we don’t want you to over- or under-scope, or end up fighting problems that are distracting from the actual proposal
  • complexity – think about the three V of big data: Volume, Velocity, Variety, your project should ideally address all of these, but it is very unrealistic that you find a problem that does. So please identify which of the three Vs you have as the primary issue. Simply scraping the website and visualizing the data doesn’t cut it. Please check this project which is interesting but in my opinion is too simple:
  • method – we haven’t talked about the computational methods we teach you, so this is really hard to eyeball, but in general your project should either detect interesting patterns in data (like clusters, modules, trends) or allow predictions (trends, time series, consumer basket) or allow classifications (who will win, who will buy XYZ, this object belongs in this category)
  • originality – if you come up with a project that is mind blowing, or at least very interesting, but does not really conforms to the above, we will probably still try to make it work, simply because.

I have been asked about the business aspect, and ideally your project should answer a business question, or should have commercial value. However, I think it is more important to perform an analysis accurately on a toy problem, than using a real world problem that is either not interesting, or not teaching you the necessary skills. Therefor I am relaxing this constraint, but would of cause be happy if you chose a business problem.

Here are a couple of ideas that I had, but that are also already inspired by conversations with you:

  • money ball – you download some form of sport stats and try to derive a model that is predictive about a game outcome, watch the movie “money ball” or read the book if you are curious about this. In essence you can make your baseball team much better, if you optimize for players that get you on the “first base”.
  • interest biases – I am not talking about money but about what people are interested in. I crawled and tried to find gender biases and stereotypes per state:, bloggers is easy to crawl, has geo tags, blogs that people read, and follow
  • interest clusters – it is not clear how interests relate to each other, one could cluster interests and find categories of interests that belong together
  • interest profiling – if you know one or two interests can you predict what the person might be interested as well?
  • social networks – get data which you use to derive a social (or other type of) network and identify cluster (groups), more interesting would be to see how and why the network changes
  • Diet Coke and Fries – I guess this is a stupid title, but there is the idea that when you order fries it totally doesn’t matter anymore if you order diet coke or not, calorie wise you are over your limit already, however people still order diet coke to fries – or do they? Data can reveal such contradictions, or open opportunities: People invest in either risky or conservative funds, however bet-hatching suggests that you should do both.
  • recommendation system – regardless of the webservice, there is always the option to improve how data is found or accessed, better clustering or better classification as well as a totally new approach is thinkable

Please feel free to add your suggestions and ideas, the more we move ideas around the better. Cheers Arend

Leave a Reply