Most organisations seem to use a classification system (or taxonomy) of some sort, for instance for safety classification, and much time is spent developing and using such taxonomies. Importantly, decisions may be made on the basis of the taxonomy and associated database outputs (or it may be that much time is spent on development and use, but little happens as a result). There is therefore a risk of time and money spent unnecessarily, with associated opportunity costs. Still, taxonomies are a requirement in all sorts of areas, and several things should be kept in mind when designing and evaluating a taxonomy. This posts introduces twelve properties of effective classification systems.
Effective classification schemes are difficult to develop. The following properties
need to be considered to develop a valid classification scheme that is accepted and produces the desired results.
A classification scheme must be used reliably by different users (inter-coder reliability or consensus) and by the same users over time (intra-coder reliability or consistency). Reliability will depend on many factors, including the degree of true category differentiation, the adequacy of definitions, the level of hierarchical taxonomic description being evaluated, the adequacy of the material being classified, the usability of the method, the adequacy of understanding of the scheme and method, and the suitability of reliability measurement. Adequate reliability can be very difficult to achieve (see Olsen and Shorrock, 2010 $$), and the heterogeneity of methodologies employed by researchers measuring reliability of incident coding techniques make it more difficult to to critically compare and evaluate different schemes (see Olsen, 2013 $$). However, if a classification scheme cannot be used reliably, then it is usually fair to say that it is not fit for purpose, especially for analysing large data sets (though it may be that reliability is achieved for certain users in certain contexts)
2. Mutual exclusivity
Categories should be mutually exclusive on the same horizontal level, so that it is only possible to place subject matter into one category. This relates to reliability. There are varying degrees of mutual exclusivity, since categories often have things in common, or overlap to some degree, depending on the criteria. Mutual exclusivity tends to be lower for abstract or unobservable concepts. This is especially true for psychological labels, and even more so those that are all-consuming (such as ‘situation awareness’, ‘mental model’, or ‘information processing’). For properly differentiated categories with clear definitions, appropriate guidance can reduce sources of confusion (see Olsen and Williamson, 2017 $$).
3. Comprehensiveness (or ‘content validity’)
It should be possible to place every sample or unit of subject matter somewhere. However, choices must be made about the granularity of categories. Highly detailed classification schemes and classification schemes that offer little granularity suffer from different problems concerning mutual exclusivity, usability, face validity, usefulness, etc.
The codes within a classification system should be stable. If the codes change, prior classification may be unusable, making comparison difficult. On the other hand, it should be possible to update a classification scheme as developments occur that truly affect the scope and content (e.g., new technology). Ideally, changes should have minimal impact.
5. Face validity
A classification system should ‘look valid’ to people who will use it or the results emanating from it. An industry classification scheme should incorporate contextual and domain-specific information (‘contextual validity’), but should also sit comfortably with pertinent theory and empirical data (‘theoretical validity’). The best approach here is to stick with what is well-understood and accepted.
6. Diagnosticity (or ‘construct validity’)
A classification scheme should help to identify the interrelations between categories and penetrate previously unforeseen trends. This may relate more to the database and method than the taxonomy itself.
A classification scheme should enable different levels of analysis according to the needs of a particular query and known information. This is often achieved by a modular and hierarchical approach. Shallow but wide taxonomies tend to suffer from low flexibility.
A classification scheme should provide useful insights into the nature of the system under consideration, and provide information for the consideration of practical measures (e.g., for improvement).
9. Resource efficiency
The time taken to become proficient in the use of a classification scheme, collect supporting information, etc., should be reasonable. Continued difficulties in using a classification scheme, after initial training and supervised practice, usually indicate a design problem and signal the need for (re-)testing.
A classification scheme should be easy to use in the applied setting. This means that the developers should be able to demonstrate a human-centred design process akin to ISO 9241-210. The most relevant aspects of usability should be determined. For instance, some users may have formal training in the use of the classification scheme, little time to make inputs, limited understanding of terms and acronyms, etc.
It should be possible to train others how to use the classification scheme and achieve stated training objectives, including any required levels of reliability. In some cases, there may be valid reasons to go to only to the original developers for training (e.g., the taxonomy is sensitive or commercialised). In such cases, there is a need to consider why this is the case, and the possible related implications (e.g., lack of peer reviewed, public domain accounts of development; lack of independent testing).
Classification schemes should normally be amenable to independent evaluation. This means that they must be available and testable on the requirements above using an appropriate evaluation methodology. This will of course be more difficult for taxonomies that are restricted for various reasons (commercial, security, misuse prevention, etc).
In practice, it will not be possible to achieve anywhere near perfection on these criteria. Even where evaluation results are very positive (assuming there is any evaluation), experience in use will usually be different (and usually worse from the users’ points of view) and undocumented. Trade-offs must be made and some of the properties above will be more important than others, depending on the application. For instance, in some cases, the priority may be to help investigators to ensure that relevant issues have been considered, perhaps also to model the interactions between them (see Four Kinds of Human Factors: 4. Socio-Technical System Interaction). In other cases, the priority may be to help analysts understand prevalence and trends in very large data sets. In still other cases, the priority may be to help users with little time or knowledge (‘casual users’) make basic inputs. These user groups have different needs and expectations.
It may also be necessary to use a taxonomy that is not adequate on some of the criteria above. In all cases, there is a need to understand the possible risks (e.g., time spent using the taxonomy; decisions made on the basis of the data) and to manage these risks (e.g., ignore data for categories that are know to be unreliable; merge categories; analyse data based on a hierarchically higher category/level up). However, three basic activities should be undertaken to help achieve adequate validity:
- Involve appropriate stakeholders in taxonomic development and evaluation, with a focus on understanding their needs the associated taxonomic requirements, and the trade-offs between requirements. This should include people who understand human-centred design, taxonomy and all relevant aspects of the scope of the classification scheme.
- Review relevant literature, analyse the work and system, and review other classification schemes (including ones previously used by any stakeholders).
- Test the classification scheme throughout its development and implementation.
This post is based on a short briefing note that I produced for an Australian government agency meeting in 2004, not long after being awarded a PhD related to taxonomy (461 pages; reading not recommended, but available on request). Since I sometimes find it hard to find this note, I thought it might be useful to put online, also in the hope that it might help someone else. The post focusses on the properties of effective taxonomies that relate to development, and not so much on the use, mis-use and abuse of taxonomies. Another post, maybe.