TECH TALK: News Refinery Characteristics

What does the News Refinery do?

  1. It collects all the headlines and perhaps brief summaries from all news sites globally on a continuous basis.
  2. It allows me to set up preferences so that I can make my own daily newspaper by putting together these headlines – it may have multiple pages which reflect my different interests.
  3. I can reference these pages or the headlines for a site by date: which means there is a closure of an edition that happens from one day to next. This ensures that if I am on vacation and cannot access the web, I can always go back and look at the headlines of that day. (This also requires an archival facility for the headlines.)
  4. I can create my own “taxonomy” to classify stories into folders. Ideally, I would want to extend an existing taxonomy, which means stories may need to be pre-classified.
  5. I want the stories downloaded on to my computer so I have access to them offline also.
  6. I should be able to search stories: not just the headlines but also the full-text of the stories. This means that the headlines will need to botted and get the full-text also. It also means a sophisticated search engine. [A bot is an automated program which fetches a page, given a URL]
  7. I want to get alerts based on certain keywords. Alerts should be deliverable on multiple devices.
  8. I want to access stories by: source (publication), date, author and keywords/people in the content of the story.
  9. I should be able to set up on-line folders for stories so classified so that I can share these with others. The current alternative is to email stories to friends and colleagues. In fact, by monitoring my activities, it should be able to recommend which story goes to whom. It should also keep track of which stories I have forwarded to whom.
  10. I should be able to comment on the stories (like a Post-It). This can make for a discussion thread, or allow me to maintain a Weblog. This can also serve as the basis of creating communities.
  11. For magazines, I should be able to create a Table of Contents (ToC) for all the articles in that issue. Have multiple ToCs on one page to allow me to easily scan a new interesting source that I may have discovered.
  12. In many cases, an article not linked from one of the top pages of a site is lost forever (unless discovered by Search). So, a listing of all recent stories (say of the past week, and on a single page) would be very useful.
  13. I want my personal newspaper emailed to me daily.
  14. It should work in English and other languages. For other languages, it should be able to do translations.
  15. It should strip out the news from the story the unnecessary “clutter” (ads, unnecessary images) and focus only on the content of the story, thus reducing the size of the page.

