Multiclass naive bays classification as probabilistic model

I have a model based on Naive-bays classifier (multinomial Naive bays) that i have fitted on data set with just one feature ( categorical observation) and a label :


observation ; label
funny smell ; Acanthamoeba Infection
fever ; ear infection
fever ; cancer
nose bleed ; Acanthamoeba Infection

now i have received new data samples that include n copies of the observation
observation ; label
fever + nose bleed + leg pain ; head trauma .

is there a way to incorporate the new samples into the model ? i’m using scikit learn implementation for the naive bays . i was thinking about using the conditional probability of knowing the relations that each single observation have with label and how each observation contribute to my current knowledge , but not sure how can i use it with sickit learn implementation.

All topic

Splitting strategy for a multiclass problem

I have a 16 class data , Indian Pines , of dimensions 145x145x200.

From the data size , I am unable to gauge whether to do stratified splitting , random splitting , or plain train-test split , before starting to make predictions on the model. (Model used is SVM.)

All topic

weighted cross entropy for imbalanced dataset – multiclass classification

I am trying to classify images to more then a 100 classes, of different sizes ranged from 300 to 4000 (mean size 1500 with std 600). I am using a pretty standard CNN where the last layer outputs a vector of length number of classes, and using pytorch’s loss function CrossEntropyLoss.

I tried to use $weights = frac{max(sizes)}{sizes}$ for the cross entropy loss which improved the unweighted version, not by much.

I also thought about duplicating images such that all classes ends up to be of the same size as the larges one.

Is there any standard way of handling this sort of imbalance?

All topic

Multiclass classification of timeseries data using NN

I have the following dataset that I was thinking of using RNN/LSTM for classifying the protocol. Data contains features from packet capture that has only two provided fields: OUI of the MAC address of the device and the inter-arrival time (IAT) of the packets for a specific protocol.

For example, the labeled data for http and ntp from the point-of-view of a specific device type is provided below:

# MAC, ConnectivityIAT  -> protocol     
ff:f1:f2, 10, -> http
ff:f1:f2, 20, -> http
ff:f1:f2, 30, -> http
ff:f1:f2, 0, -> http

ff:f1:f2, 3, -> ntp
ff:f1:f2, 6, -> ntp
ff:f1:f2, 9, -> ntp

The task is to predict the correct class when something like:

ff:ff1:f2, 10 -> ?

is seen.

Questions:

  • How do I convert the categorical value of OUI to numerical entities
  • How to approch the solution using a NN ?

All topic

Does a Monk/Druid multiclass character’s Unarmored Movement add to their speed while in Wild Shape?

Does a Monk/Druid multiclass character’s Unarmored Movement feature add to their speed while in Wild Shape?

Starting at 2nd level Monk, they have +10 speed, but does this bonus apply when they are in Wild Shape?

All topic

Does a Monk/Druid multiclass characters Unarmored Movement add to their Wild Shape speed?

Does a Monk/Druid multiclass characters Unarmored Movement add to their Wild Shape speed?

Starting at 2nd level Monk, they have +10 speed, but can they add this extra speed to their wild shape speed?

All topic

multiclass classification having class imbalance with Gradient Boosting Classifier

I am using Abalon data for classification from UCI(https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data). I have scaled data and used TSNE for visualization.

data=pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data')
x=data.drop('15', axis=1)
y=data['15']
import matplotlib as plt
mapping={'M':0,'I':1,'F':2}`x['M'].replace(mapping,inplace=True)`
from sklearn.preprocessing import StandardScaler
sc=StandardScalar()
x_scaled=sc.fit_transform(x)
from sklearn.manifold import Isomap,TSNE
sne=TSNE(n_components=2)
x_red_sne=sne.fit_transform(x_scaled)
plt.scatter(x=x_red_sne[:,0],y=x_red_sne[:,1],c=data['15'],cmap='spectral')

Visualization of data in 2D

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.cross_validation import cross_val_score,train_test_split
from sklearn.metrics import classification_report,f1_score

 x_train,x_test,y_train,y_test=train_test_split(x_scaled,y,train_size=.7)
gb=GradientBoostingClassifier(n_estimators=200,learning_rate=.1)
gb.fit(x_train,y_train)
cross_val_score(estimator=gb,X=x_test,y=y_test,scoring='f1_weighted',cv=5)
print classification_report(y_true=y_test,y_pred=gb.predict(x_test))

This model is failing poorly as from the classification report its showing all metrics recall, f1, precision as .23,.22,.24.

I understand its multiclass classification with high class imbalance. What can I do to improve the model?

All topic

How to implement one vs rest classifier in a multiclass classification problem?

I have a dataset which contains 750 data points with 8 classes in the target variable. I tried implementing simple machine learning models and also did hyperparameter tuning but they results were not much impressive. The best log loss that i could get was 1.52 with a misclassification rate of 53%. What are the other methods that i could apply to improve the models performance ? Also, I want to implement onevsrest classifier with the hope that, it would improve the results, but implementation of the same is not clear to me although I looked in the internet for some clear codes.

All topic

Implementing a threshold for multiclass classification with softmax activation

We have a CNN where images are classified into between 7 and 30 classes, depending on the training set. The final output is via SoftMax activation, and thus all the probabilities add to one.

I notice that often some unseen image triggers a high probability for one class, even when it looks nothing like the typical image from that class. I infer that the some feature from that class has responded just a little to that image, but not for anyother classes, and SoftMax has amped up the probabilities so they add to one.

While we make an effort to try to cover all variations of images in the training set, some classes have such huge variation that this is impossible. So I would like to put a threshold on the probabilities, so that we can filter out some of the low-confidence false detections. How does one do this? Threshold on the input into the SoftMax layer instead of the output? Some other kind of measure?

All topic

Implementing a threshold for multiclass classification with softmax activation

We have a CNN where images are classified into between 7 and 30 classes, depending on the training set. The final output is via SoftMax activation, and thus all the probabilities add to one.

I notice that often some unseen image triggers a high probability for one class, even when it looks nothing like the typical image from that class. I infer that the some feature from that class has responded just a little to that image, but not for anyother classes, and SoftMax has amped up the probabilities so they add to one.

While we make an effort to try to cover all variations of images in the training set, some classes have such huge variation that this is impossible. So I would like to put a threshold on the probabilities, so that we can filter out some of the low-confidence false detections. How does one do this? Threshold on the input into the SoftMax layer instead of the output? Some other kind of measure?

All topic