- Multi-class text classification cross-bench@2021 # 0 1 2 -

HuffPost dataset.


Documents
Classes
Vocabulary
Commons
Json

Processing

MLP#0 from reuters_mlp.py in exemples repository
Model Dataset Score
print('Building model #0...')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_split=0.1)
score = model.evaluate(x_test, y_test,
                       batch_size=batch_size,
		       verbose=1)
2020-05-07 16:31:37,456 : INFO : HuffPost dataset : 
[200853] size, [25000] size2,
[(18750, 27192)] x_train, [(18750, 28)] y_train,
[(6250, 27192)] x_test, [(6250, 28)] y_test,
[28] classes, [27192] vocabulary,
Building model #0...
Dataset HuffPost , Model 0 , 
Test score: 1.3521850590515136 , 
Test accuracy: 0.6449599862098694

5:HEALTHY LIVING   -> 4:CRIME, 25:GOOD NEWS
16:POLITICS        -> 12:COMEDY, 4:CRIME
19:WORLD NEWS      -> 15:WOMEN, 12:COMEDY
23:ENTERTAINMENT   -> 18:TRAVEL, 8:SPORTS
5:HEALTHY LIVING   -> 12:COMEDY, 15:WOMEN
39:BUSINESS        -> 12:COMEDY, 27:TECH
40:WELLNESS        -> 4:CRIME, 13:ARTS & CULTURE
8:SPORTS           -> 11:GREEN, 4:CRIME
28:PARENTING       -> 0:MONEY, 23:ENTERTAINMENT
15:WOMEN           -> 12:COMEDY, 11:GREEN
MLP#1 from guide#Multilayer Perceptron (MLP) for multi-class softmax classification
print('Building model #1...')
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(max_words,)))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
2020-05-07 17:26:01,786 : INFO : HuffPost dataset : 
[200853] size, [25000] size2,
[(18750, 27192)] x_train, [(18750, 28)] y_train,
[(6250, 27192)] x_test, [(6250, 28)] y_test,
[28] classes, [27192] vocabulary,
Building model #1...
Dataset HuffPost , Model 1 , 
Test score: 2.022144215698242 , 
Test accuracy: 0.47231999039649963

6:THE WORLDPOST    -> 17:LATINO VOICES, 19:STYLE
8:POLITICS         -> 5:BLACK VOICES, 19:STYLE
32:WELLNESS        -> 17:LATINO VOICES, 19:STYLE
26:WORLD NEWS      -> 17:LATINO VOICES, 19:STYLE
32:WELLNESS        -> 17:LATINO VOICES, 20:IMPACT
32:WELLNESS        -> 5:BLACK VOICES, 19:STYLE
28:QUEER VOICES    -> 17:LATINO VOICES, 20:IMPACT
34:TRAVEL          -> 17:LATINO VOICES, 19:STYLE
12:HEALTHY LIVING  -> 5:BLACK VOICES, 19:STYLE
7:COMEDY           -> 17:LATINO VOICES, 20:IMPACT

20news dataset.


Documents
Classes
Vocabulary
Commons
Json

Processing

MLP#0 from reuters_mlp.py in exemples repository
print('Building model #0...')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_split=0.1)
score = model.evaluate(x_test, y_test,
                       batch_size=batch_size,
		       verbose=1)
2020-05-07 16:59:08,191 : INFO : 20news dataset : 
[1764] size, [1774] size2, 
[(1323, 24730)] x_train, [(1323, 3)] y_train, 
[(441, 24730)] x_test, [(441, 3)] y_test, 
[3] classes, [24730] vocabulary,
Building model #0...
Dataset 20news , Model 0 , 
Test score: 0.0912760134845499 , 
Test accuracy: 0.9727891087532043

1:rec.sport.baseball -> 1:rec.sport.baseball, 
2:sci.space          -> 2:sci.space,  
0:comp.graphics      -> 0:comp.graphics, 
1:rec.sport.baseball -> 1:rec.sport.baseball, 
0:comp.graphics      -> 0:comp.graphics, 
2:sci.space          -> 2:sci.space, 
1:rec.sport.baseball -> 1:rec.sport.baseball, 
1:rec.sport.baseball -> 1:rec.sport.baseball,
0:comp.graphics      -> 0:comp.graphics, 
0:comp.graphics      -> 0:comp.graphics, 
MLP#1 from guide#Multilayer Perceptron (MLP) for multi-class softmax classification
print('Building model #1...')
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(max_words,)))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
2020-05-07 17:20:45,606 : INFO : 20news dataset : 
[1764] size, [1774] size2, 
[(1323, 25336)] x_train, [(1323, 3)] y_train, 
[(441, 25336)] x_test, [(441, 3)] y_test, 
[3] classes, [25336] vocabulary,
Building model #1...
Dataset 20news , Model 1 , 
Test score: 1.012091209558673 , 
Test accuracy: 0.920634925365448

1:rec.sport.baseball -> 1:rec.sport.baseball,
0:comp.graphics      -> 0:comp.graphics, 
2:sci.space          -> 2:sci.space, 
2:sci.space          -> 0:comp.graphics, 
1:rec.sport.baseball -> 1:rec.sport.baseball, 
2:sci.space          -> 2:sci.space, 
0:comp.graphics      -> 0:comp.graphics, 
0:comp.graphics      -> 0:comp.graphics, 
0:comp.graphics      -> 0:comp.graphics,
2:sci.space          -> 2:sci.space, 

Reuters dataset.


Documents
Classes
Vocabulary
Commons
Json

Processing

MLP#0 from reuters_mlp.py in exemples repository
print('Building model #0...')
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_split=0.1)
score = model.evaluate(x_test, y_test,
                       batch_size=batch_size,
		       verbose=1)
2020-05-07 17:01:57,268 : INFO : reuters dataset : 
[11218] size, [11228] size2, 
[(8413, 25900)] x_train, [(8413, 46)] y_train, 
[(2805, 25900)] x_test, [(2805, 46)] y_test, 
[46] classes, [25900] vocabulary,
Building model #0...
Dataset reuters , Model 0 , 
Test score: 0.7079413878513955 , 
Test accuracy: 0.8363636136054993

3:earn             -> 3:earn, 20:interest
3:earn             -> 3:earn, 1:grain
3:earn             -> 3:earn, 19:money-fx
3:earn             -> 3:earn, 4:acq
3:earn             -> 3:earn, 20:interest
1:grain            -> 1:grain, 28:livestock
4:acq              -> 4:acq, 3:earn
39:pet-chem        -> 4:acq, 3:earn
16:crude           -> 3:earn, 19:money-fx
3:earn             -> 3:earn, 20:interest
MLP#1 from guide#Multilayer Perceptron (MLP) for multi-class softmax classification
print('Building model #1...')
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(max_words,)))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
2020-05-07 17:10:58,913 : INFO : reuters dataset : 
[11218] size, [11228] size2, 
[(8413, 25874)] x_train, [(8413, 46)] y_train, 
[(2805,5874)] x_test, [(2805, 46)] y_test, 
[46] classes, [25874] vocabulary,
Building model #1...
Dataset reuters , Model 1 , 
Test score: 1.5681505824580337 , 
Test accuracy: 0.6171122789382935

3:earn             -> 3:earn, 4:acq
16:crude           -> 3:earn, 16:crude
9:coffee           -> 19:money-fx, 11:trade
3:earn             -> 3:earn, 4:acq
3:earn             -> 3:earn, 4:acq
2:veg-oil          -> 1:grain, 16:crude
3:earn             -> 3:earn, 4:acq
4:acq              -> 4:acq, 3:earn
3:earn             -> 3:earn, 4:acq
3:earn             -> 3:earn, 4:acq

Imdb dataset.


Documents
Classes
Vocabulary
Commons
Json
TODO

#30@2020.05-15k#l