Feature analysis could be employed for bias detection when evaluating the procedural fairness of algorithms. (This is an alternative to the ”Google approach” which emphasis evaluation of outcome fairness.)
In brief, feature analysis reveals how well each feature (=variable) influenced the model’s decision. For example, see the following quote from Huang et al. (2014, p. 240):
”All features do not contribute equally to the classification model. In many cases, the majority of the features contribute little to the classifier and only a small set of discriminative features end up being used. (…) The relative depth of a feature used as a decision node in a tree can be used to assess the importance of the feature. Here, we use the expected fraction of samples each feature contributes to as an estimate of the importance of the feature. By averaging all expected fraction rates over all trees in our trained model, we could estimate the importance for each feature. It is important to note that feature spaces among our selected features are very diverse. The impact of the individual features from a small feature space might not beat the impact of all the aggregate features from a large feature space. So apart from simply summing up all feature spaces within a feature (i.e. sum of all 7, 057 importance scores in hashtag feature), which is referred to as un-normalized in Figure 4, we also plot the normalized relative importance of each features, where each feature’s importance score is normalized by the size of the feature space.”
They go on to visualize the impact of each feature (see Figure 1).
Figure 1 Feature analysis example (Huang et al., 2014)
As you can see, this approach seems excellent for probing the impact of each feature on the model’s decision making. The impact of sensitive features, such as ethnicity, can be detected. Although this approach may be useful for supervised machine learning, where the data is clearly labelled, the applicability to unsupervised learning might be a different story.