Putting constraints on output of deep neural network

I am training a deep neural network. There is a constraint on an output value of the network. (e.g. Output has to be between 0 and 180) I think some possible solutions are using sigmoid,tanh activation at the end of the layer. I wonder if there are better ways to put constraints on the output value of a neural network.

Looking for benchmark regression datasets used to test neural network models

I’m looking for benchmark regression datasets to test a neural network model. The dataset should have one or more real-valued outputs. For example, we are working with SynthesEyes dataset in which the outputs are 3 real-valued angles of the subjects eye movement. I’m looking for other common real valued outputs on which to compare certain features of our model architecture on.

The dataset should use CNNs or RNNs for state-of-the-art (though a CNN problem would be slightly preferred). A typical regression dataset with uncorrelated feature input would not be appropriate (for example AutoMPG). A fictitious exmaple of what I’m looking for would be to predict the # of liters of water in the image of a glass of water. An image problem, CNNs, outputs a real value # (liters of water).

How to minimize sharpe ratio with LSTM recurrent neural network?

I’ve read some articles about trading using recurrent reinforcement learning such as this one. The point where I do not fully understand is how to construct the cost/loss function.

In the article, Sharpe Ratio is one of the options that we can let the RNN minimize. The definition of Sharpe Ratio is $frac{Average(R_t)}{StandardDeviation(R_t)}$ where $R_t$ is the return on investment. So I assume this $R_t$ (return) here is the reward of reinforcement learning.

The target of the algorithm is to maximize sharpe ratio, so my question is, how should I construct the structure of neural network/reinforcement learning framework in order to implement this gradient ascent method to maximize sharpe ratio?

In particular, the input data is price series, what should the output data be? What should the cost/loss function be?

MNIST – Vanilla Neural Network – Why Cost Function is Increasing?

I’ve been combing through this code for a week now trying to figure out why my cost function is increasing as in the following image. Reducing the learning rate does help but very little. Can anyone spot why the cost function isn’t working as expected?

I realise a CNN would be preferable, but I still want to understand why this simple network is failing.
Please help:)

Runaway Cost Function

import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import matplotlib.pyplot as plt

mnist = input_data.read_data_sets("MNIST_DATA/",one_hot=True)

def createPlaceholders():
    xph = tf.placeholder(tf.float32, (784, None))
    yph = tf.placeholder(tf.float32, (10, None))
    return xph, yph

def init_param(layers_dim):
    weights = {}
    L = len(layers_dim)

    for l in range(1,L):
        weights['W' + str(l)] = tf.get_variable('W' + str(l), shape=(layers_dim[l],layers_dim[l-1]), initializer= tf.contrib.layers.xavier_initializer())
        weights['b' + str(l)] = tf.get_variable('b' + str(l), shape=(layers_dim[l],1), initializer= tf.zeros_initializer())

    return weights

def forward_prop(X,L,weights):
    parameters = {}
    parameters['A0'] = tf.cast(X,tf.float32)

    for l in range(1,L-1):
        parameters['Z' + str(l)] = tf.add(tf.matmul(weights['W' + str(l)], parameters['A' + str(l-1)]), weights['b' + str(l)])
        parameters['A' + str(l)] = tf.nn.relu(parameters['Z' + str(l)])

    parameters['Z' + str(L-1)] = tf.add(tf.matmul(weights['W' + str(L-1)], parameters['A' + str(L-2)]), weights['b' + str(L-1)])
    return parameters['Z' + str(L-1)]

def compute_cost(ZL,Y):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = tf.cast(Y,tf.float32), logits = ZL))
    return cost

def randomMiniBatches(X,Y,minibatch_size):
    m = X.shape[1]
    shuffle = np.random.permutation(m)
    temp_X = X[:,shuffle]
    temp_Y = Y[:,shuffle]

    num_complete_minibatches = int(np.floor(m/minibatch_size))

    mini_batches = []

    for batch in range(num_complete_minibatches):
        mini_batches.append((temp_X[:,batch*minibatch_size: (batch+1)*minibatch_size], temp_Y[:,batch*minibatch_size: (batch+1)*minibatch_size]))

    mini_batches.append((temp_X[:,num_complete_minibatches*minibatch_size:], temp_Y[:,num_complete_minibatches*minibatch_size:]))

    return mini_batches

def model(X, Y, layers_dim, learning_rate = 0.001, num_epochs = 20, minibatch_size = 64):
    tf.reset_default_graph()
    costs = []

    xph, yph = createPlaceholders()
    weights = init_param(layers_dim)
    ZL = forward_prop(xph, len(layers_dim), weights)
    cost = compute_cost(ZL,yph)
    optimiser = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(num_epochs):
            minibatches = randomMiniBatches(X,Y,minibatch_size)
            epoch_cost = 0

            for b, mini in enumerate(minibatches,1):
                mini_x, mini_y = mini
                _,c = sess.run([optimiser,cost],feed_dict={xph:mini_x,yph:mini_y})
                epoch_cost += c
            print('epoch: ',epoch+1,'/ ',num_epochs)

            epoch_cost /= len(minibatches)
            costs.append(epoch_cost)

    plt.plot(costs) 
    print(costs)



X_train = mnist.train.images.T
n_x = X_train.shape[0]
Y_train = mnist.train.labels.T
n_y = Y_train.shape[0]
layers_dim = [n_x,10,n_y]

model(X_train, Y_train, layers_dim)

How do I calculate output of a Neural Network?

I just started learning about ANNs about a week ago with no classical training. Just by watching videos and reading blogs/white papers, I’ve gotten this far.

I have a question about the final output of the ANN.

Say for instance I was building a XOR with two input node, 3 nodes in one hidden layer and one node in the output layer. A 2 x 3 x 1.

First I would like to make sure I have the first part right.

So each node has a weight associated with it for each node in the hidden layer, if you have 5 nodes in the hidden layer, the input node would calculate it’s input and multiply it by a weight associated with each node in the hidden layer.

So to calculate the sigmoid for the first node, you would take all the inputs and multiply it by the weight (no + for a bias) and apply the sigmoid function for the sum of the inputs * weights. Then we would squash that value with a sigmoid and get 0.5866175789173301.

Essentially, it would be, (1 x .25) + (1 x .10) = .35.

Now, that I just do this three times for each node
and get 3 squashed numbers.

  // (input1 * HiddenNode(x)Weight) + (input2 * HiddenNode(x)Weight)
  activationFunction((1 * .25) + (1 * .10)) // 0.5866175789173301
  activationFunction((0 * .40) + (1 * .60)) // 0.6456563062257954
  activationFunction((1 * .20) + (0 * .80)) // 0.549833997312478

Now from what I understand, I again sum & squash those answers:

  activationFunction(hidden1 + hidden2 + hidden3) // 0.8559569515861635

Do I have it correct so far?

My question is, if you’re feeding in two scaled numbers to predict grades,
89 & 6.5 = (grade/hours of sleep)

How would you calculate the output from .8559 to a number like 93 and calculate the error on that value? Am I missing anything besides a bias?

If I entered in the percent of change for the last 3 stock price changes, and I wanted it to guess the fourth price, how would I convert an answer like this:

 activationFunction(hidden1 + hidden2 + hidden3) // 0.8559569515861635

to an answer to like .10 (percent change in stock price) or any other real world answer?

Thanks in advance!

Genetic neural network to satisfy variable number of inputs and outputs

I have what I propose as a solution to my problem, however I haven’t ever seen it mentioned in this way, so I worry that there is a valid reason not to do things this way.

I have a dataset of > 100,000 events, where each event has a winner.
I have plenty of data points, some data on the event itself, and some data on each entrant.

The number of entrants in each event is variable, and I want to build a neural network around picking a likely winner of the events.

As the number of entrants is variable, what appears to be common advice is to have enough inputs for the maximum case scenario, and 0 them out for events where there are empty slots.

This feels somewhat inelegant, and I had a slightly different idea.

I was going to have a NN where the inputs are information about the event, and information about 1 entrant. I would then have a single output (a float between 0 and 1). I would run this through, getting 1 output for each entrant in an event, then I would be left with a number of floats, equal to the number of entrants in the event. I would then select the highest value, and use the entrant that refers to as the choice for the winner.

Is there a reason I shouldn’t be doing it this way? Is there a better solution I haven’t yet come across?

Neural Network on EV3 Mindstorm without 3rd Party Software

I am working on a prototype for an Ev3 Neural Network. Because for competitions, we are not allowed to use Bluetooth or Wifi connections, the neural network must be made with the Ev3 block-based programming system (LabView for Lego Mindstorms). I am currently working on a feed-forward neural network that uses a genetic algorithm to learn. I will now explain the specifics.

The neural network has a simple job: learn the difference between blue and red.
The network has three layers (input, hidden, output).
There are two inputs. Both inputs are reflected light intensity, measured one after another with the same sensor.
The hidden layer has three neurons which perform a summation and output the value with a sigmoid function.
The output layer has one neuron. Values above or equal to 0.5 are outputted as blue while the rest are outputted as red.

Since there are no calculus blocks in LabView for mindstorms, the summation is performed as a series of multiplication and addition problems. e is estimated as 2.71828182846 in the sigmoid function. Every neuron has sigmoid rectification except the input neurons.

The reason I chose to differentiate red and blue is because it is a good place to start and LabView for Mindstorms has a block that already knows the difference between blue and red (Color – Color Sensor Block). I can use this to tell the program if its guesses were right or wrong.

Because the Mindstorm is a feed-forward neural network, it has both weights and biases. The weights and biases are randomly selected between the values -5 to 5. (these were arbitrally chosen, I am not sure what to choose).

Using this network, the program generates 10 (arbitrarily chosen number) different “species” (I am not sure what to call these) each with 12 different weights and biases. I make each “species” take a test of 10 (arbitrarily chosen number) questions on which they are given a grade (# of right/ total #) based on their guess and the real answer.

The program then generates a list of all the grades. Using a bubble sort program (which I have created, but haven’t been successful) there are 90 comparisons that are made to sort the grades from greatest to least. The top two grades are chosen and their associated “species” have 10 offspring generated by randomly selecting the weights and biases of the two.

Then the whole process is then repeated and theoretically, the best list of weights and biases should be generated.

Not having any schooling in Deep Learning or programming, I am wondering if I am doing anything wrong. So far, I have completed the randomization of weights/biases, the structure of the neural network, and the bubble sorting of the test scores ( which still has not worked). I am suspicious that my inputs both being reflected light intensity and my weight/bias constrain to -5 to 5 may prevent my network from performing optimally. Please provide your guidance on what I should fix or if more information is necessary. Thank you for your time.

Classification probability issue in neural network–is it possible with accuracy?

I’m using a neural network mainly for binary classification. I’m using the cost function of mean squared error (cross-entropy seems to have the same results).

The problem that I’m having is that the network is either guessing 1 or 0 when optimized (class or no-class) instead of outputting the correct probability for the presence of the class.

Here’s the data I’m using:

input -> output

1 -> 0 (60% of the time)

1 -> 1 (40% of the time)

2 -> 0 (40% of the time)

2 -> 1 (60% of the time)

It makes sense to just guess 1 or 0 when I sketch it out. Here is an example for when the input is zero 0–.40 would be the correct probability to output, and the neural net is correct 60% of the time:

Score—–E Correct—–E Incorrect—–Avg Error

.40——–.4—————-0.6—————0.48

0———–0—————-1——————0.4

Notice that average error is much lower for guessing a 1 or 0. Is there a standard tactic for avoiding this and getting the correct probability as an output?

It seems that the same thing occurs for a) two output neurons, b) a softmax output.

How is this done to get an actual / reliable percentage output when the network is fully trained on that data?

How much extra information can we conclude from a neural network output values?

Consider I have a 3 layers neural network.

  • Input Layer containing 784 neurons.
  • Hidden layer containing 100 neurons.
  • Output layer containing 10 neurons.

My objective is to make an OCR and I used MNIST data to train my network.

Suppose I gave the network an input taken from an image, and the values from the output neurons are the next:

  • $0: 0.0001$
  • $1: 0.0001$
  • $2: 0.0001$
  • $3: 0.1015$
  • $4: 0.0001$
  • $5: 0.0002$
  • $6: 0.0001$
  • $7: 0.0009$
  • $8: 0.001$
  • $9: 0.051$

When the network returns this output, my program will tell me that he identified the image as number 3.

Now by looking at the values, even though the network recognized the image as 3, the output value of number 3 was actually very low: $0.1015$. I am saying very low, because usually the highest value of the classified index is as close as 1.0, so we get the value as 0.99xxx.

May I assume that the network failed to classify the image, or may I say that the network classified the image as 3, but due to the low value, the network is not certain?

Am I right thinking like this, or did I misunderstand how does the output actually works?