A post by Jason Kottke on “Sampling Networks Accurately” has sparked off an interesting discusson. The challenge: “How do you construct a fairly accurate map of a network (the weblog universe in this case) with a sample size much smaller than the total number of nodes (weblogs)? Is it even possible? A random sampling would work, but how do you tell your spider to go find a random node when it can only find nodes though links from other nodes?”
I’d like to think that BlogStreet has done a decent job at analysing some aspects of blogosphere, with a sample of 100,000+ blogs and growing. We have focused more on (a) neighbourhood analysis with externally-sourced graphical visualisation (b) blog ranking by popularity and a weighted one (c) identifying RSS feeds. There are a lot more ideas which we have been thinking on, and will implement in the coming months. I’ll write more on this soon.