Monthly Archives: July 2011

Ruby utility script – csv2json

A script that I wrote and am using more and more.

Arguments are “filename.ext”

require ‘json’

source = ARGV[0]
if source.nil?
puts “Missing filename argument”
puts “USAGE: #{$0} filename.ext”

unless source && FileTest.exists?(source)
puts “#{source} not found!”

puts “Processing ‘#{source}'”


file =,’r’)
ext = File.extname(source).to_s.downcase

outname = source.sub(ext,’.json’)
out =,’w’)

sep = case ext
when ‘.csv’
when ‘.psv’
when ‘.tab’
# assume comma since it is *CSV*2json
puts “Extension: #{ext} using ‘#{sep.source}'”

lines =\n\r?/)

header = lines[0].split(sep)
data = lines.drop(1).map {|item| item.split(sep)}

outdata = []

data.each do |item|
h =
for i in (0..(header.size – 1))
h[header[i]] = item[i]
outdata << h end out.write outdata.to_json out.close puts "File written to " + out.path

Ages of the Internet

As the web matures, the commercialisation pattern is changing – new monetisation opportunities arise while the incumbent “cash cows” get commoditised and the margin is squeezed. Each of the “Ages” below has its heroes that created the technology and the business model to succeed and then dominate. Each age was made possible by the standardisation and commoditisation of the prior ages.

The current age is Clouds, we are at the very beginning of this age and it is likely to dominate technical thinking for the next 2 to 4 years. Clouds will drive the cost of scaling and the barrier to entry of minor media players down. The battle for media in the age of clouds is differentiation and agility in maximising content reuse.

The “Product” The “Currency” Key Players
Age of Pipes • Hardware
• Protocols
Bandwidth • Sun
Age of Portals • Publishing Platforms
• Online Classifieds
• Personal Homepages
Content • AOL
• Big Pond
• Alta Vista
• Yahoo
• Geocities
Age of Search • Search Engines
• Recommendation
• Crowdsourcing
• Google
• Yahoo
• Bing
• Wiki
Age of Clouds • Software as Service
• Distributed Systems
• Always On/Ubiquitous Computing
Scale • Amazon
• Google
• Microsoft
Age of Semantic • Context
• Entities
• Sentiment
• Relationship
• Intuition/Personal User Agents
• Big Media?
• Facebook
• Google
Age of Trust • Security
• Privacy
• Identity
• Authority
Reputation • Big Media?
• Standards Bodies
• Cultural Institutions
• Knowledge Markets

Why should Big Media bother?
Because each age moves from engineering towards social, the technical challenges become less about “is it possible?” and more “how can it be used?”. Big media companies are in a unique position because they have made the investment in hardware, platforms, content and traffic. Big media has an established audience and, maybe more importantly, audiences untapped (niche interest, hyper local, international, etc.). The Age of Clouds will drive hosting costs down, increase computing power for small competitors to differentiate products through “good enough” brute force methods (eg. Content scraping, data mining, machine learning and business intelligence techniques). In the age of semantic web, the incumbents will be those that understand the content and the audience best – this is why Google, Apple and Microsoft are falling over themselves to introduce social aspects because relationships, entities and sentiment are engagement glue. Once the volume of content is too big, and the user can access it on any device, any time, any where the products that will succeed in this age are the ones that turn mass data into information and knowledge and relevance, which is actually the key competency of journalism.

When a consumer has access to the world media and as domestic and international territories blur, breaking news won’t be so much as speed to publish, but rather, speed to discover and then socialise.

The Age of Semantic changes the questions from: “What happened?” to:
• What does it mean?
• What is hidden?
• Why did it happen?
• Will it happen again?