Taxonomy. That's what sets Drupal apart, and makes it so much more useful than many of its alternatives. But it's unduly intimidating at first: let's peek under the hood to see how to take advantage of it.
The taxonomy mechanism is the heart of what makes Drupal so different from most other content management systems. But experience in the drupal support channel shows it is not always well understood.
Taxonomy data model
First things first, although the service provided by the taxonomy mechanism ("categories" at various places in the Drupal UI) is simple, the implementation requires no less than six tables (see diagram) for basic features:
- The main table is
term_data
. This is where the terms used for classification are defined. Every term is given a unique term identifier, ortid
. - The second most important table is
vocabulary
. Each of the terms interm_data
belongs in exactly one vocabulary, to which it is linked by thevid
column. - For vocabularies allowing it, term hierarchy is defined, obviously enough,
in
term_hierarchy
, in which each tid has one row for each of its parent tids, or one row with (virtual) parent tid 0. - Terms are mostly used to classify the basic unit of content in Drupal,
the
node
. This is the purpose of theterm_node
table, which implements the term/node relationship. Note that they can be used for other purposes like user classification (more on that below). - Synonyms are handled through the use of the
term_synonym
table, in which each row links to an existing tid and defines a new name for it. - For vocabularies in which this option has been enabled, the
term_relation
allows for the inter-linking of terms: each row defines a pair of tid values as describing related terms. - Drupal allows vocabularies to be limited to some node types.
This is implemented by the
vocabulary_node_types
lookup table
There is an implied integrity relationship :
node/type
must match vocabulary_node_types/nodeType
for every instance of term_node/tid
. This is currently implemented
in code by drupal modules.
In the current implementation (4.6.4), even if hierarchy is not used,
each tid will also have at least one row in this table, with
parent = 0
, to show it is a "root" node.
It may have more than one parent, which prevents
replacing the term_hierarchy
table by just a
parent
column in the term_data
table.
IMHO, since this is essentially an implementation artefact that
costs significant data space, it does not seem poised to remain
in place for very long.
Taxonomy use
As questions on the support channel suggest, the use of drupal categories, as implemented by the taxonomy module may not be guided enough: I had a case yesterday where the user wanted additional code to prevent terms in one category (i.e. vocabulary) to be used on a node along with other terms from another category on the same node. In most cases, this points to an information architecture problem at a higher level: if terms are mutually dependent, like these terms that had to be exclusive, then they belong in the same classification axis, meaning the same vocabulary.
This is where the hierarchical nature of Drupal classifications comes in handy: instead of defining a set of specialized vocabularies with dependence on other vocabularies, all it takes is for one to define a hierarchical vocabulary, within which specialized subtrees will be implemented as children of higher level terms, thus ensuring mutual exclusion.
In short, if there is one only word to remember when designing an information taxonomy, or in layman's terms when configuring categories on Drupal, this word is orthogonality. Proper orthogonal category design will often save a lot of time implementing case-specific rules in code.
There is an introduction to the concept in the "Derived meanings/Computer Science" section on orthogonality at wikipedia.
Taxonomy beyond nodes
Although the taxonomy system in 4.6.x Drupal is geared towards use in nodes,
it can be put to other uses. As a proof of concept, Karoly Negyesi
(aka Chx)
has created the "userstag" module enabling the use of taxonomies on users.
Use your favorite search engine to query for drupal userstag chx
for the current URL. This module uses a term_datauser,
similar in purpose to the term_node
table in regular taxonomy use.
Note that this is NOT supported code, or even contributed code,
and as such should not be used on a production system unless you are
ready to maintain it or have it maintained.
Tagged for drupal, taxonomy, information architecture, information model.
Warning: the previous version of this post contained an error, noticed and fixed by Killes (confusion between synonyms and related terms). Thanks to him
2007-02-09 - The diagram on the right has been updated for Drupal 5 (the initial diagram on the top of this post was for 4.6.x)
2005-11-28 - Update : this page, as well as other in the Grokking Drupal series of my blog, is now available on drupal.org. The version on drupal.org will probably be updated with time, whereas this one probably won't be.
term_data ?
"This module uses a term_data, similar in purpose to the term_node table..."
Um, wouldn't that conflict with the main term_data table that's the core of the vocabulary schema? Is that the right table name?
@IP : 216.240.48.15
Typo: it's term_user
There was a typo: it's term_user, not term_data. I've changed it in the body of the post.
Typo
"node/type must match vocabulary_node_types/nodeType"
Shouldn't it be:
node/type must match vocabulary_node_types/type
Clarification on the data model
Maybe this is not too clear, but the data models as I represent them on diagrams like this one is a logical one. It does not map character for character to the phyisical implementation: for instance, you can see the type of some variables as being a StringList, which is not even a PHP type.
In the case of
vocabulary_note_types
, this abstract field name maps indeed to thetype
column in the MySQL physical model.