05 Jun, 2015
Kaggle contest review - Bike Sharing Demand
This kaggle bike sharing demand challenge is to forecast use of a city bikeshare system.
Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.
The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.
Submissions are evaluated one the Root Mean Squared Logarithmic Error (RMSLE). The RMSLE is calculated as
- n is the number of hours in the test set
- pi is your predicted count
- ai is the actual count
- log(x) is the natural logarithm
Data preprocess with python. Using random forest number of trees = 50. Result:
This is the first kaggle challenge that I participate in because of an coursera course Introduction to Data Science. During the process of competing, I have improved quite a lot. This is my first kaggle competition, and is not the last. I am currently working on the challenge with prize pool. Hope will have a good result with state of the art machine learning technique - deep learning.
Its quite interesting working in kaggle project as I am competing with the world data scientist. By the way, this is my kaggle profile