Grokking Drupal: the taxonomy system

Submitted by Frederic Marand on

UML class diagram thumbnail for Drupal taxonomy system in 4.6.4Taxonomy. That's what sets Drupal apart, and makes it so much more useful than many of its alternatives. But it's unduly intimidating at first: let's peek under the hood to see how to take advantage of it.

The taxonomy mechanism is the heart of what makes Drupal so different from most other content management systems. But experience in the drupal support channel shows it is not always well understood.

Taxonomy data model

First things first, although the service provided by the taxonomy mechanism ("categories" at various places in the Drupal UI) is simple, the implementation requires no less than six tables (see diagram) for basic features:

  • The main table is term_data. This is where the terms used for classification are defined. Every term is given a unique term identifier, or tid.
  • The second most important table is vocabulary. Each of the terms in term_data belongs in exactly one vocabulary, to which it is linked by the vid column.
  • For vocabularies allowing it, term hierarchy is defined, obviously enough, in term_hierarchy, in which each tid has one row for each of its parent tids, or one row with (virtual) parent tid 0.
  • Terms are mostly used to classify the basic unit of content in Drupal, the node. This is the purpose of the term_nodetable, which implements the term/node relationship. Note that they can be used for other purposes like user classification (more on that below).
  • Synonyms are handled through the use of the term_synonym table, in which each row links to an existing tid and defines a new name for it.
  • For vocabularies in which this option has been enabled, the term_relation allows for the inter-linking of terms: each row defines a pair of tid values as describing related terms.
  • Drupal allows vocabularies to be limited to some node types. This is implemented by the vocabulary_node_types lookup table

There is an implied integrity relationship : node/type must match vocabulary_node_types/nodeType for every instance of term_node/tid. This is currently implemented in code by drupal modules.

In the current implementation (4.6.4), even if hierarchy is not used, each tid will also have at least one row in this table, with parent = 0, to show it is a "root" node. It may have more than one parent, which prevents replacing the term_hierarchy table by just a parent column in the term_datatable. IMHO, since this is essentially an implementation artefact that costs significant data space, it does not seem poised to remain in place for very long.

Taxonomy use

As questions on the support channel suggest, the use of drupal categories, as implemented by the taxonomy module may not be guided enough: I had a case yesterday where the user wanted additional code to prevent terms in one category (i.e. vocabulary) to be used on a node along with other terms from another category on the same node. In most cases, this points to an information architecture problem at a higher level: if terms are mutually dependent, like these terms that had to be exclusive, then they belong in the same classification axis, meaning the same vocabulary.

This is where the hierarchical nature of Drupal classifications comes in handy: instead of defining a set of specialized vocabularies with dependence on other vocabularies, all it takes is for one to define a hierarchical vocabulary, within which specialized subtrees will be implemented as children of higher level terms, thus ensuring mutual exclusion.

In short, if there is one only word to remember when designing an information taxonomy, or in layman's terms when configuring categories on Drupal, this word is orthogonality. Proper orthogonal category design will often save a lot of time implementing case-specific rules in code.

There is an introduction to the concept in the "Derived meanings/Computer Science" section on orthogonality at wikipedia.

Taxonomy beyond nodes

Although the taxonomy system in 4.6.x Drupal is geared towards use in nodes, it can be put to other uses. As a proof of concept, Karoly Negyesi (aka Chx) has created the "userstag" module enabling the use of taxonomies on users. Use your favorite search engine to query for drupal userstag chx for the current URL. This module uses a term_datauser, similar in purpose to the term_node table in regular taxonomy use. Note that this is NOT supported code, or even contributed code, and as such should not be used on a production system unless you are ready to maintain it or have it maintained.

Tagged for , , , .

Warning: the previous version of this post contained an error, noticed and fixed by Killes (confusion between synonyms and related terms). Thanks to him

UML class diagram thumbnail for Drupal taxonomy system in Drupal 5.x2007-02-09 - The diagram on the right has been updated for Drupal 5 (the initial diagram on the top of this post was for 4.6.x)

2005-11-28 - Update : this page, as well as other in the Grokking Drupal series of my blog, is now available on drupal.org. The version on drupal.org will probably be updated with time, whereas this one probably won't be.

tatere (not verified)

Mon, 2006-06-12 23:31

"This module uses a term_data, similar in purpose to the term_node table..."

Um, wouldn't that conflict with the main term_data table that's the core of the vocabulary schema? Is that the right table name?

@IP : 216.240.48.15

There was a typo: it's term_user, not term_data. I've changed it in the body of the post.

Abhijeet (not verified)

Wed, 2008-05-14 10:07

"node/type must match vocabulary_node_types/nodeType"

Shouldn't it be:

node/type must match vocabulary_node_types/type

Frederic Marand

Thu, 2008-05-15 10:48

In reply to by Abhijeet (not verified)

Maybe this is not too clear, but the data models as I represent them on diagrams like this one is a logical one. It does not map character for character to the phyisical implementation: for instance, you can see the type of some variables as being a StringList, which is not even a PHP type.

In the case of vocabulary_note_types, this abstract field name maps indeed to the type column in the MySQL physical model.