Radius of gyration is a metric to quantify distributions around a center location. Its applications range from structural engineering to molecular physics. Since it incorporates the idea of dealing with locations, it can be applied for geographic data, as well. I recently came across with it in some global mobility studies where the goal was to characterize the travel patterns of individuals. In those papers, the metric indicates whether a person is more likely to travel long distances or not. In my research, where I am interested in geographic data contributions of volunteer mappers, I found it to be extremely useful to decide if the overall contribution shows local or global patterns. For many years, local knowledge was considered to be the main advantage of this so-called user generated geographic information. Local guys know the place, let them draw maps, let them take photos and the product will be accurate. While this is most probably true, it also seems that some of these guys like to do the same thing in distant places so there might be other factors than localness that can make these data sources accurate, therefore extremely valuable. Everything is up to the people who contribute, so the ultimate goal is still to understand their behavior. Now, enough of the crazy talk. Click on “read more” to do some fancy math and coding.
I wanted to get rid of long line segments in my data. Since I don’t know any software tools off the top of my head that would do the job, I decided to code it myself. I have already stored everything in a spatially-enabled PostgreSQL table, and to be honest, recently I am more interested in manipulating data at the database level to save some time. So, instead of writing a python script and looping through a cursor, I created a function which I am calling in SQL queries. The picture below gives an overview of what the following function does. Basically, it breaks input geometries at line segments that exceed a certain threshold in length. Red means “no bueno”, green means “yaay!”.
Originally I wanted to write about visualization in the 2nd post (and after that in the 3rd) but that post would have been too long to read. I always loose myself if it comes to writing but nevermind, finally it’s here. So, we know how to access to the DB and we can query for interesting subsets – even in a geographic way. All we have to do is to interpret our results. I’m presenting two ways, a gif animation and a wordcloud. It’s not about reinventing the wheel but still, I believe that these are useful approaches to complement each other.
In the previous posts I have introduced the topic and did some simple coding to explore the data. That’s not bad at all but usually the goal is to create something new or at least to understand what is going on. In this simple example we’re interested in the weather. We want to see what people tweeted about the weather during the data collection period. Unfortunately, a dataset of 200.000 tweets is not big enough to recreate the weather conditions for that time. Why? Simply because after getting rid of the unrelated tweets we have almost nothing to deal with. If you’re here because you’re interested in the past weather of the UK, I think you should better visit this site :). For the others, I promise I’m going to tell you how I created some maps.
image by Havadurumu
In the previous post, I’ve introduced the topic and technology. Now, it’s time to define the problem and methods. Next entries will discuss how to access to MongoDB and how to retrieve geocoded Tweets. I will focus on tweets that are somehow related to the weather using the simplest approach possible – querying their content for the keyword ‘weather’. I will create some nice visualizations later on, an animated gif and a wordcloud that can help us understand what is behind the scenes. You’ll find some code snippets and screenshots so feel free to scroll down to those if you’re not interested in long discussions. So, let’s grab the data from MongoDB and see what’s inside! There’s quite much to do.