Tag Generator: tag clouds for textual analisys

Tag Generator

In the talk entitled ‘Ontology Is Overrated’ he gave at the O’Reilly Emerging Technology conference in March 2005, Clay Shirky claims that current categorization methods (taxonomies and schemes), are giving place to more organic ways of organizing information based on two units: the link and thetag. In fact in digital content categorization shows many limits, first of all the textual weave generated by linking. The web is not an unambiguous domain, a limited group of objects with specified formal categories. Thus surfers utilize tagging system in order to link a group of words they consider meaningful to a specific object, and so they create tag clouds. According to Shirky then, even if tagging might create chaotic categorizations, it is possible to extract a huge amount of information from the chaos of textual data. This last idea might have inspired Chirag Metha, author of Tag Generator, a PHP codebase that lets you generate tag clouds from text data sources. The generator is based on the Porter Stemming Algorithm and it makes a list of all the unique words that have been used in the chosen texts, counting how many times each word is used. Once the language-specific words, like articles and adverbs, are removed, the generator makes a “tag cloud” with the more commonly used words shown with a bigger font size and the less frequently used ones with a smaller one. The application then adds a chronological analysis, brightening the recently used words while fading away words which haven’t been used in a while. the visual impact of the computed cloud is immediate when applied to the limited group of US presidential speeches made since the Independence Day, and the differences are evident, too. While for the founder fathers ‘assembly’ was the keyword, in Bush’s last speeches the most recurring term is (not surprisingly) ‘terrorism’. If classifications generally tend to historicize, Metha work offer also a dynamic description of language evolution. However the problems related to the generation of meaning still lasts, even if, in this case, the creation of the tag cloud is a derivative process. As in first generation tag clouds, it is necessary to consider both the subjective association and the context in which the association is generated. For Tag Generator the identification of the used text group is essential, whether they are speeches, email or blog posts . The context is still the keyword for interpretation, but the quantity, the quality and the relevance of semantic information, both in virtual and real environments, is increasing and tag clouds offer then a valid support.

Valentina Culatti