03 Oct

Data sets for data mining, data science, visualization & machine learning

Data sets are the beginning step in solving any kind of data mining, data science and machine learning problems. It has already been emphasized by lot of data scientists that data set (reliable, cleaned & in right format) is very important to get insightful results. We are just starting to explore the open data sets (for example USDA data for a project Krishakanam), one thing became clear that it takes good amount of time to understand the data set itself to begin building something insightful on top of it.

23 Sep

Decoding USDA Data Further

USDA data contains lot of parameters and information which can become very useful if presented in right visualization. In previous post, we briefly explored the types of data points available from USDA, here we will dig further to understand what are the parameters which are there. Out of the four types of data sets, two are geolocation based with different area granularity while the other two are statistical data which are linked to state, county etc.
16 Sep

First Hack - USDA Innovation Challenge

Actually few weeks back, I came across the Atlassian Codegeist contest on and while looking at some other contests, I found an interesting one from USDA, which is to create innovative apps using USDA data (NASS & ERS sensor data). It seemed interesting and my initial thought was to take the data (which is available in downloadable form), transform it, to store it in either elasticsearch (if its temporal or spatial event data) or neo4j (if its networked or linked data).

