Skip to main content

Victoria Harbour, Hongkong

Github Repository

Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

Fisher and Kernel Fisher Discriminant Analysis: Tutorial: This is a detailed tutorial paper which explains the Fisher discriminant Analysis (FDA) and kernel FDA. We start with projection and reconstruction. Then, one- and multi-dimensional FDA subspaces are covered. Scatters in two- and then multi-classes are explained in FDA. Then, we discuss on the rank of the scatters and the dimensionality of the subspace. A real-life example is also provided for interpreting FDA. Then, possible singularity of the scatter is discussed to introduce robust FDA. PCA and FDA directions are also compared. We also prove that FDA and linear discriminant analysis are equivalent. Fisher forest is also introduced as an ensemble of fisher subspaces useful for handling data with different features and dimensionality. Afterwards, kernel FDA is explained for both one- and multi-dimensional subspaces with both two- and multi-classes. Finally, some simulations are performed on AT&T face dataset to illustrate FDA and compare it with PCA. Benyamin Ghojogh, Fakhri Karray, Mark Crowley

Dimensionality Reduction

Manifold learning is an approach to non-linear dimensionality reduction. Algorithms for this task are based on the idea that the dimensionality of many data sets is only artificially high.

High-dimensional datasets can be very difficult to visualize. While data in two or three dimensions can be plotted to show the inherent structure of the data, equivalent high-dimensional plots are much less intuitive. To aid visualization of the structure of a dataset, the dimension must be reduced in some way.

The simplest way to accomplish this dimensionality reduction is by taking a random projection of the data. Though this allows some degree of visualization of the data structure, the randomness of the choice leaves much to be desired. In a random projection, it is likely that the more interesting structure within the data will be lost.

To address this concern, a number of supervised and unsupervised linear dimensionality reduction frameworks have been designed, such as:

Dataset

A multivariate study of variation in two species of rock crab of genus Leptograpsus: A multivariate approach has been used to study morphological variation in the blue and orange-form species of rock crab of the genus Leptograpsus. Objective criteria for the identification of the two species are established, based on the following characters:

  • SP: Species (Blue or Orange)
  • Sex: Male or Female
  • FL: Width of the frontal region of the carapace;
  • RW: Width of the posterior region of the carapace (rear width);
  • CL: Length of the carapace along the midline;
  • CW: Maximum width of the carapace;
  • BD: and the depth of the body;

The dataset can be downloaded from Github.

(see introduction in: Principal Component Analysis PCA)

raw_data = pd.read_csv('data/A_multivariate_study_of_variation_in_two_species_of_rock_crab_of_genus_Leptograpsus.csv')

data = raw_data.rename(columns={
'sp': 'Species',
'sex': 'Sex',
'index': 'Index',
'FL': 'Frontal Lobe',
'RW': 'Rear Width',
'CL': 'Carapace Midline',
'CW': 'Maximum Width',
'BD': 'Body Depth'})

data['Species'] = data['Species'].map({'B':'Blue', 'O':'Orange'})
data['Sex'] = data['Sex'].map({'M':'Male', 'F':'Female'})
data['Class'] = data.Species + data.Sex

data_columns = ['Frontal Lobe',
'Rear Width',
'Carapace Midline',
'Maximum Width',
'Body Depth']
# generate a class variable for all 4 classes
data['Class'] = data.Species + data.Sex

print(data['Class'].value_counts())
data.head(5)
  • BlueMale: 50
  • BlueFemale: 50
  • OrangeMale: 50
  • OrangeFemale: 50
SpeciesSexIndexFrontal LobeRear WidthCarapace MidlineMaximum WidthBody DepthClass
0BlueMale18.16.716.119.07.0BlueMale
1BlueMale28.87.718.120.87.4BlueMale
2BlueMale39.27.819.022.47.7BlueMale
3BlueMale49.67.920.123.18.2BlueMale
4BlueMale59.88.020.323.08.2BlueMale
# normalize data columns
data_norm = data.copy()
data_norm[data_columns] = MinMaxScaler().fit_transform(data[data_columns])

data_norm.describe()
IndexFrontal LobeRear WidthCarapace MidlineMaximum WidthBody Depth
count200.000000200.000000200.000000200.000000200.000000200.000000
mean25.5000000.5272330.4553650.5290430.5150530.511645
std14.4670830.2198320.1878350.2163820.2099190.220953
min1.0000000.0000000.0000000.0000000.0000000.000000
25%13.0000000.3584910.3284670.3822190.3840000.341935
50%25.5000000.5251570.4598540.5288750.5253330.503226
75%38.0000000.6823900.5693430.6846500.6640000.677419
max50.0000001.0000001.0000001.0000001.0000001.000000

2-Dimensional Plot

no_components = 2

lda = LinearDiscriminantAnalysis(n_components = no_components)
data_lda = lda.fit_transform(data_norm[data_columns].values , y=data_norm['Class'])

data_norm[['LDA1', 'LDA2']] = data_lda

data_norm.head(1)
SpeciesSexIndexFrontal LobeRear WidthCarapace MidlineMaximum WidthBody DepthClassLDA1LDA2
0BlueMale10.0566040.0145990.0425530.0506670.058065BlueMale1.538869-0.808137
fig = plt.figure(figsize=(10, 8))
sns.scatterplot(x='LDA1', y='LDA2', hue='Class', data=data_norm)

Fisher Linear Discriminant Analysis (LDA)

Fisher Linear Discriminant Analysis (LDA)

3-Dimensional Plot

no_components = 3

lda = LinearDiscriminantAnalysis(n_components = no_components)
data_lda = lda.fit_transform(data_norm[data_columns].values , y=data_norm['Class'])

data_norm[['LDA1', 'LDA2', 'LDA3']] = data_lda

data_norm.head(1)
SpeciesSexIndexFrontal LobeRear WidthCarapace MidlineMaximum WidthBody DepthClassLDA1LDA2LDA3
0BlueMale10.0566040.0145990.0425530.0506670.058065BlueMale1.538869-0.8081371.18642
class_colours = {
'BlueMale': '#0027c4', #blue
'BlueFemale': '#f18b0a', #orange
'OrangeMale': '#0af10a', # green
'OrangeFemale': '#ff1500', #red
}

colours = data_norm['Class'].apply(lambda x: class_colours[x])

x=data_norm.LDA1
y=data_norm.LDA2
z=data_norm.LDA3

fig = plt.figure(figsize=(10,10))
plt.title('Linear Discriminant Analysis')
ax = fig.add_subplot(projection='3d')

ax.scatter(xs=x, ys=y, zs=z, s=50, c=colours)

Fisher Linear Discriminant Analysis (LDA)