Metadata for Multimedia

Ramesh Jain writes:

Text is effectively one dimensional though it is organized on a two-dimensional surface for practical reasons. Currently, most meta data is also inserted using textual approaches. To denote the semantics of a data item, a tag is introduced before it to indicate the start of the semantics and another closing tag is introduced to signal the end. These tags can also have structuring mechanisms to build compound semantics and dictionaries of tags may be compiled to standardize and translate use of tags by different people.

When we try to assign tags to other media, things start getting a bit problematic due to the nature of media and the fact that current methods to assign tags are textual. Suppose that you have an audio stream, may be speech or may be other kind of audio, how do we assign tags in this? Luckily audio is still one dimensional and hence one can insert some kind of tag in a similar way as we do in texts. But this tag will not be textual, this should be audio. We have not yet considered mechanisms to insert audio tags.

I believe that we can utilize meta data for multimedia data. But the beauty of the multimedia data is that it brings in a strong experiential component that is not captured using abstract tags. So techniques needs to be developed that will create meta data that will do justice to multimedia data.

Diego adds:

The problem is not just one of metadata creation, but of metadata access.

Metadata is inevitably thought of as “extra tags” because, first and foremost, our main interface for dealing with information is still textual. We don’t have VR navigation systems, and voice-controlled systems rely on Voice-to-Text translation, rather than using voice itself as a mechanism for navigation.

Creating multimedia metadata will be key, but I suspect that this will have limited applicability until multimedia itself can be navigated in “native” (read: non-textual) form. Until both of these elements exist, I think that using text both as metadata (even if it’s generated through conversions, or context, like Google Image Search does) and text-based interfaces will remain the rule, rather than the exception.

Published by

Rajesh Jain

An Entrepreneur based in Mumbai, India. View all posts by Rajesh Jain