Archive for March, 2014

Export XML files to CSV (or Android I18N files to iPhone I18N files)

Monday, March 24th, 2014

Android and iOS have two different peculiarities when it comes to I18N: they present different file formats. Google platform uses XML, whereas Cupertino’s uses CSV files. Lately we came to a problem to unify some of out I18N resources. In order to do that, we wanted to have a unified format to easily compare whether the strings where correctly modified or not.

This Ruby script was designed to export Android strings into a .CSV file (and therefore, really easy to script if you need to compare it with its Apple pair). Remember you need to install Nokogiri to make it work.

And call it with ruby scriptname.rb inputfile.xml outputfile.csv

Which city has the most intense Android scene in Europe?

Saturday, March 8th, 2014

StackExchange Data Explorer is an open source tool to run SQL queries against public data from StackOverflow. Since StackOverflow is the biggest development forum of the world, there is surely a lot of information that companies can actually retrieve from their system in order to take some business decision (this is actually a brilliant place to apply BigData)

Moving now to different issues: I was discussing with some event organizers the possibility of bringing an Android event from the USA to Europe. Since I do live in Munich (and besides being a trendy mobile city, I think is a really cool place to organize such events), I was trying to convince them that Munich was the choice. They were resilient about it, so I needed to prove with some data that Munich would be a really nice election.

At this time, I was thinking how could I use Data Explorer and BigData to support my thesis. I remember two times I used it before, to display the most active developers from Barcelona (the city I lived before), Munich, and the two cities combined. Something similar could be a valid approach. In the previous SQL queries, I was clustering the top developers from each city based on their contribution to questions tagged with the token “android“. So I could possibly group all the developers contribution from a certain city with questions tagged with the same token. I come over with this script:

I decided to search in Germany, Spain, Holland/Netherlands, France, Italy, United Kingdom, Poland and Sweden (not that Sweden is a big country in terms of population, but I do work with a bunch of Swedish colleagues :-) ). I did some little experiments to get rid of some nomenclature errors and statistical noise (for instance, I tried also with England, thinking that some developers might registered their selves as English residents instead of UK). After some refining, I came with the following result:

Screen Shot 2014-03-08 at 9.56.08 PM

There is a major preponderance of cities from Germany and the UK, with only 5 cities from different countries in the first 20 cities against 9 british cities and 6 Germans. Is not my purpose here to give a full sociologic analysis (i.e., many cities are from UK since StackOverflow is an English based community and there are other local communities in different countries – although 80% of the Internet documentation is English based), but to give a rough approach.

So, now we have a bunch of cities with some numbers. We still need to go through a normalization process (i.e., cities with more population will always have more UpVotes than cities with less population). Since StackOverflow does not provide statistics for cities population (is not either their task) I will correlate this values manually. Thus, I will assign to each city the factor that determines the relationship between UpVotes and population. To get the population value, I did use Wikipedia.

  1. London: 1.783.457 / 8.174.000 = 0.21
  2. Reading: 701.683 / 145.700 = 4.815
  3. Berlin: 570.379 / 3.502.000 = 0.16
  4. Paris: 381.560 / 2.234.105 =0.17
  5. Amsterdam: 335.999 / 779.808 = 0.43
  6. Cambridge: 328.603 / 123.900 = 2.65
  7. Munich: 252.516 / 1.378.000 = 0.18
  8. Frankfurt: 139.674 / 691.518 = 0.20
  9. Manchester: 126.113 / 510.700 = 0.24
  10. Lyon: 124.057 / 474.946 = 0.26
  11. Warsaw: 116.670 / 1.717.000 = 0.06
  12. Oxford: 100.504 / 150.200 = 0.66
  13. Hamburg: 979.30/ 1.799.000 = 0.05
  14. Madrid: 96.906 / 3.234.000 = 0.02
  15. Karlsruhe: 95.848 /297.488 = 0.32
  16. Ulm: 92.675/ 123.672 = 0.74
  17. Edinburgh: 84.748 / 495.370 = 0.17
  18. Brighton: 81.612 / 155.919 = 0.52
  19. Ulverston: 79.056 / 11.524 = 6.86
  20. Barcelona: 78.130 / 162.1000 = 0.04

And if we now sort it by the coefficient:

  1. Ulverston (UK): 6.86
  2. Reading (UK): 4.815
  3. Cambridge (UK):  2.65
  4. Ulm (Germany): 0.74
  5. Oxford (UK): 0.66
  6. Brighton (UK): 0.52
  7. Amsterdam (Netherlands): 0.43
  8. Karlsruhe (Germany): 0.32
  9. Lyon (France): 0.26
  10. Manchester (UK): 0.24
  11. London (UK): 0.21
  12. Frankfurt (Germany): 0.20
  13. Munich (Germany): 0.18
  14. Paris (France): 0.17
  15. Edinburgh (UK): 0.17
  16. Berlin (Germany): 0.16
  17. Warsaw (Poland): 0.06
  18. Hambug (Germany): 0.05
  19. Barcelona (Spain): 0.04
  20. Madrid (Spain): 0.02

There is some very interesting information in this graph:

  • The result of UK cities are brilliant. Reading, Cambridge and Oxford, famous for their universities, are all in the top 5
  • Ulverston, a tiny city of North West England, scores first in the ranking. It would be interesting to determine the reasons, since there is no known university or industry in the town. Probably a top user from StackOverflow explains it, but it can be discarded as statistical noise.
  • For most of the cities, a value between 0 and 1 seems the norm.
  • Big capitals are generally in the mid part of the table, except Amsterdam

After this sampling Munich does not score that bad (although better in absolute terms). There are, however, a bunch of other different reasons to choose a place for a certain event (proximity to other places, infrastructures, communication, hosting prices, global number of possible attendants, etc). But after this little experiment, I can only suggest to organizers to move to the city of Ulverston (even if I still think that Munich offers a great beer).

Follow me on Twitter @eenriquelopez !