Broadband in South Korea

South Korea’s leadership in broadband is once again emphasised by this article in NYTimes:

With a hefty push from the government, South Korea’s telecommunications providers have built the world’s most comprehensive Internet network, supplying affordable and reliable access that far surpasses what is available in the United States, even in those homes that have their own broadband setup.

And now that most of the nation is online at high speeds, South Koreans are shifting more of their analog lives to their computers, where they watch soap operas, attend virtual test preparation schools, sing karaoke and, most of all, play games.

“The killer application of the Internet is speed,” said Lee Yong Kyung, the chief executive of the KT Corporation, formerly known as Korea Telecom, which controls nearly half of the country’s broadband market. “The money is in the pipes.”

SCO v IBM

SCO is fighting IBM on Linux, alleging that some copyrighted software has made its way into the Linux source. Eric Raymond has a position paper, representing the Open Source Initiative:

SCO’s complaint is factually defective in that it implies claims about SCO’s business and technical capabilities that are untrue. It is, indeed, very cleverly crafted to deceive a reader without intimate knowledge of the technology and history of Unix; it gives false impressions by both the suppression of relevant facts, the ambiguous suggestion of falsehoods, and in a few instances by outright lying.

Bill Gates Speech Transcript

It is always fascinating to read what Bill Gates has to say, especially when he is talking about the future. Here is his speech given at the Newspaper Association of America Annual Convention (WSJ report):

Well, this decade we call the digital decade. Why do we use that term? Well, despite the popularity of the PC in the ’90s, most of the activities that people engaged in were not changed. The main activities that were changed were creating documents, where the word processor was preeminent, and the starting of electronic mail as a way of communicating. By the end of this decade, 2009, the number of activities that will have been changed by digital approaches will be extremely broad. It will be common sense, certainly for your younger readers, if not all of them, to pay bills electronically. The music that they buy will be digital. A lot of the material that they read will be read off the screen. The way that kids stay in touch with each other will be instant messaging brought to a whole new level, with voice and video as part of that interaction. The way that people buy and sell, that you bid out to buy something, will be fundamentally changed by electronic commerce. Electronic commerce was over-hyped, because the foundation had not been put in place. But, now over the last few years companies like ourselves and IBM, under the industry term Web services, are actually building that foundation to make that common sense.

Storewidth

Graeme Thickins has a review of The George Gilder/Forbes conference. Much of it is in the form of a collection of quotes by various people. A sample:

“We’re in ‘The Cheap Revolution’ — where Google gets 150 million page views per month from 12,000 cheap, off-the-rack PCs….where Wi-Fi, which costs hardly anything, proliferates, while France Telecom loses $25 billion in a ear….and China graduates many more engineers per year than the U.S., who’ll work for $12,000 per year.” – Rich Karlgaard, Publisher, Forbes Magazine

“Digital content — rich media — will grow 700%
by 2006, when it will be a third of all data.” – Sujal Patel, CTO, Isilon Systems

“Managing storage as a service is coming — the analogy is the water utility…The ‘water’ is always available, and is more valuable than the pipes…The ‘bill’ requires what we call ‘chargeback’. No one does that yet, but
it won’t be long.” – Jonathan Martin, Senior Director of Product Planning, Veritas

“The SMB market will be the battleground for iSCSI, NAS, serial attached SCSI, and serial ATA drives.” – Mike Smith, EVP-Worldwide Marketing, Emulex

Newsblaster and Agonist for News

Besides Google News, there are a couple other sites to look at for news aggregations. Maciej Ceglowski has this to say about Newsblaster:

Columbia has a news digest site they call Newsblaster which is the best I’ve ever seen. Each news category has a summary auto-synthesized out of a slew of articles that their crawler finds. Unlike Google news, which just pastes together the lead sentences of several stories, Columbia does some sophisticated processing on the text to actually pick out and rank the most relevant sentences. Then it figures out how to stitch them together.

The second site is Agonist, which is a blog by Sean-Paul Kelley. Very good compilations.

Marciej has an interesting thought: “You can imagine how useful good auto-generated summaries would be in a blog aggregator/categorization tool.” We should think about doing this…I too have often thought of how a summarizer could be a good thing to have.

Search Technologies

Azeem Azhar provides a perspective on Google’s purchase of Applied Semantics, and also gives a nice backgrounder on the technology of searching.

Information retrieval is the core of all search businesses. It is about creating software that solves a hard question: getting computers to understand human language with all its vagaries. These vagaries include:

– polysemy (words with multiple meanings like DRIVE or SET)
– synonymy (different words with similar meanings like AIRPLANE and AIRCRAFT)
– multi-word expressions which need to be treated as such (BILL CLINTON)
– errors, typos and poor grammar

For example, a key word search engine would find it hard to distinguish between A RED FISH and A FISH IN THE RED SEA

Broadly speaking there have been two major schools of thought. The first is one I call the statistical school and the second is the semantic.

The statistical school held that context could be determined by look at statistical patterns within documents and across documents in a collection. Essentially, they use a variety of techniques to recognise word co-occurrence. So when words like DRIVE, CAR and HIGHWAY are used together frequently, we can make assumptions about the context of those words. This means that searches on the terms like SADDAM HUSSEIN may turn up documents without those words in, but with related terms like TARIQ AZIZ or IRAQ.

The other approach is the semantic approach. Here knowledge engineers build up a complex network of relationships, an ontology, that relates words together. So a CAR is defined as a type of VEHICLE and identical to the word AUTOMOBILE. A search on the word CAR will also turn up documents with the word AUTOMOBILE in it, even if they dont mention it. Such semantic networks require a good deal of work and a lot of maintenance to keep them up to date.

Continue reading

TECH TALK: Constructing the Memex: and the Memex

Write Randall Packer and Ken Jordan in their introduction to Vannevar Bushs paper in their book Multimedia: From Wagner to Reality:

Bush [proposed] a solution to what he considered the paramount challenge of the day: how information would be gathered, stored, and accessed in an increasingly information-saturated worldAlthough he addresses the subject from the vantage point of the 1940s technology relying on film processing, microfilm storage, and mechanical retrieval Bush introduces many of the concepts central to hypermedia. The machine that he proposes, the memex, is a new approach to the storing and sharing of information a memory extender (hence memex) that could organize diverse materials according to an individuals own personal associations. Conceived as a vast encyclopedia of text, images and sounds that is able to mimic the minds capability to link between ideas freely, the memex would effectively remember the leaps of thought someone had while researching a particular topic, and then make that trail of associations available to others. Bush never used the word hyperlink, but in his essay he invented that notion.

Adds Adam Brates in his book Technomanifestos: Visions from the Information Revolutionaries:

Bush imagined the memex to be an enlarged, intimate supplement of the human memory. The user would store in the computers memory magazines, newspapers, photographs, manuscripts, books, and letters. He or she would establish links trails between implicitly related documents. The memex philosophy:

We should no longer organize information in classes, subclasses and sub-subclasses.
Information should be organized by association. When an item is selected, the device should jump to the next item, and then to a third, and so on. These trails are like synapses in the brain.
Like those of memory, these trails should bifurcate, cross other trails, and become complex.
If items are used, such trails should be emphasized. If not used, they should fade out.
The machine should be fast faster and more intuitive than any existing means of information retrieval.

Bush imagined that the memex would revolutionise not only the organization of information, but its use and form. New encyclopedias and newspapers would contain built-in associative trails. Lawyers would be able to tie one case to the rest in legal history. Scientists and technologists could develop projects by building on the pieces of past projects and finding associations between different disciplines. The problem with specialization would diminish as users found links that transcended time, place and disciplineUsers could hop, skip, and jump along trails, finding easy, intuitive ways to draw parallels and patterns. All information could be expressed as pattern and path.

In the best of worlds, the memex would empower the individual as well as the community in which the individual works. Colleagues could share trailsYet each station would also be unique, incorporating the users own trails and person documents.

Tomorrow: As We May Think

Continue reading