XML’s Growth

Line56.com wonders about XML: “With standards adoption spreading, some wonder if a messaging glut will lead to infrastructure overload.”

the whole idea of business activity monitoring (BAM) and the real-time enterprise is predicated on being able to pick off and interpret XML data in midstream. That’s why the technology, with all its shortcomings, can’t be written off over the long haul. The consensus around XML makes it a coming priority to address as opportunities arise.

As more users are educated on the opportunities that come with real-time information, it could have a tidal wave effect on IT. Planning, says AMR’s Austvold, means thinking through your business segment’s most relevant real-time information needs, counting the current volume of transactions and then multiplying by at least ten. “Proctor & Gamble went through and documented 8,000 touch points between internal applications and estimated that was probably half of what they really had,” Austvold relates. “They see a storm brewing in the next four years but they don’t want to XML tag every piece of information, they’ll pick off the top 10 percent and prioritize that.”

RSS Feeds Likes and Dislikes

Sven-S. Porst discusses RSS feeds in more detail. There are some good tips on how to make RSS feeds more useful. Recently, I’ve added full post and HTML support into my RSS feed, so people don’t need to necessarily come to the blog to read what I’ve written – they can do so right in their RSS Aggregator. An interesting thought by Porst:

Feeds grouping several small entries together. This can make them a bit harder to reference, but mostly you tend to reference only the more elaborated pieces anyway. This grouping of mini-posts keeps the aggregator uncluttered and gives me more content for one entry in the aggregator. I think this can be used in many cases where the real-time aspect doesnt play a big role which in my opinion it rarely does.

RSS Aggregators as Archivers?

Paolo Valdemarin has a discussion on RSS Aggregators and suggests using them as archivers, and not just as instant readers.

If an aggregator is meant as a way to take a snapshot of what’s going on on hundreds of sources and quicky present it to us, I believe that presenting news in reverse chronological order is the way to go.

But I also think that aggregators could be an interesting way to archive content, to let somebody quickly retireve something wrote sometime in the past.

Archiving by author, again, does not make sense: most weblogging applications already do that, if I’m looking for something and I know who wrote it, I can simply look on the author’s site.

There are search engines, which are of course a good way to find information, but not always very efficient. There are cases when a directory might be more useful.

We believe that archiving by topic in a directory could be a solution, and this is what we are trying to do. It’s not for daily instant reading, it’s to archive content.

I feel that RSS readers should stick to providing the viewing capability. What is needed for archiving articles of relevance is a personal blogging tool, to which items can be posted with a drag-and-drop capability. I’ll discuss this in greater detail when I talk about how to construct the Memex.

In a different but related context, Infoworld describes about how RSS could be used to counter spam:

When aggregators become widespread, many b-to-c newsletters will switch to RSS and drop now highly unreliable e-mail. I wrote three months ago that ISPs such as Hotmail and Yahoo, trying to stop spam, shunt to a junk folder or simply delete 25 percent of newsletters requested by subscribers.

The spam tsunami is forcing many e-mail recipients to build “whitelists,” accepting messages without question only from approved senders. Interestingly, RSS subscriptions work exactly like whitelists. By design, spammers have no way to push their material into anyone’s RSS reader.

Continue reading

really Personalised Advertising

Jason Kottke writes about a cool idea by Greg Elin: “He wants a way to dump calendar items, tasks, and the like out of his calendaring system (iCal, Outlook, etc.) and have those items display as ads on the web sites that he visits. So, when he goes to Slashdot, a banner ad tells him to stop for orange juice on the way home. When he goes to news.com, there’s an ad telling him that his mother’s birthday is coming up.”

It’s one of those “wonder I didn’t think of it” ideas! Takes relevant advertising to its logical end.

TECH TALK: Constructing the Memex: DMOZ and Microsoft

Another development in the search and directory industry has been one which is diametrically opposite to Overture in terms of process and business model.

DMOZ (also called the Open Directory Project or ODP) is, according to the website, the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors It provides the means for the Internet to organize itself. As the Internet grows, so do the number of net-citizens. These citizens can each organize a small portion of the web and present it back to the rest of the population, culling out the bad and useless and keeping only the best content.

ODP has over 3.8 million sites, 56,429 editors and over 460,000 categories. It is the most widely distributed data base of Web content classified by humans. Its editorial standards body of net-citizens provide the collective brain behind resource discovery on the Web. The Open Directory powers the core directory services for the Web’s largest and most popular search engines and portals, including Netscape Search, AOL Search, Google, Lycos, HotBot, DirectHit, and hundreds of others.

At present, according to Business Week, Yahoo boasts the biggest audience, Overture the most advertising, and Google has the leading search technology. The stage is set for a battle royal, with Microsoft as the dark horse. Microsoft Research has been looking at ways to improve search, according to News.com:

While search tools exist today, a major focus of Microsoft’s research will be to allow for a freer flow of associations between data and to expand how searches can take place. Currently, data on computers is largely stored in a hierarchical fashion: A picture or document gets a file name and is stuffed into a folder. To find a document, people largely hunt and peck, a technique that also gets used on search engines.

People, however, don’t think that way, Rashid said. To find a vacation shot from Australia using newer tools, for example, a person could ask a computer to pull up pictures that feature an ocean background or family members. A search engine inside an application would then comb through the visual images to get matches.

“The problem with hierarchies is this conceit that all knowledge has a place, but no single thing fits in one space,” he said. “They become very cumbersome.”
Microsoft’s “Sapphire,” another lab experiment, exemplifies the difference. The application lists associations with a word in a document. Scroll over a person’s e-mail address, and Sapphire will pop up a balloon listing the person’s instant message address, work title, recent publications, and lists of e-mail exchanges and meetings you’ve had with this person.

Tomorrow: A Personal View

Continue reading