Durbin-Watson for multiple observations per day and missing values

I try to perform a multiple linear regression on a data set that has multiple measurements for each day over a period of multiple months.
In more detail I have ~3 to 10 observations per day on prices customers paid for a product. I investigate a time span of 3 months.

Now I want to check for autocorrelation and I face two problems:
– There are multiple observations per day and Durbin-Watson test works only if there is only one observation per day. Is it appropriate when I average the prices for each day? The problem then is, that there may be a large variance and the mean may not be sensible.
– There are missing values. So, at some days nowone bought the product. What to do with missing values?

Kind regard and thanks in advance,
Martin

Can autoregressive coefficient values be greater than 1?

I am using multivariate autoregressive (MAR) models to fit my long-term dataset of species abundances and environmental variables but when I use only the data from a specific period of the year (e.g., summer) instead of all of it, some coefficients from the resulting model are greater than 1. What does this mean? Should I reject these values from the final model? Why do I have this problem? Could it be caused by the small number of data? When I use all the data I don’t have coefficients >1 or <-1.

Is the assumption of indepence only for the sampled values informing the regression, or should it also apply…

I have 200 discrete, well-spaced plots with reasonably independent sampled values from which I’ve derived a regression equation. If I use it to predict values on a similarly sized fishnet grid, how much does it matter that neighboring grid cells are correlated in the predictor and response variables and is there anything to be done about it?

Mysql temp table insert from json can’t handle null values

The MySql-8 JSON_TABLE function allows you to handle JSON data as if it was a table.

I’m trying to use it to populate a temporary table using the generated table but I believe I’m hitting a bug.

When a null value is inserted into an INT NULL column, the following error is thrown : Invalid JSON value for CAST to INTEGER from column quantity at row 1

Here is the offending code

DROP TABLE IF EXISTS json_temp_table;
CREATE TEMPORARY TABLE json_temp_table (
  item_id int NOT NULL PRIMARY KEY,
  model_number varchar(100),
  quantity int NULL
)
ENGINE = INNODB
SELECT
  json_tb.item_id,
  json_tb.model_number,
  json_tb.quantity
FROM JSON_TABLE
(
'[{"item_id":1,"model_number":"MFJA53","quantity":4},{"item_id":2,"model_number":"HSRHJN5","quantity":null},{"item_id":3,"model_number":"FAFAF1","quantity":345}]'
, "$[*]"
COLUMNS
(
item_id int PATH "$.item_id",
model_number varchar(100) PATH "$.model_number",
quantity int PATH "$.quantity"
)
) json_tb;
SELECT
  *
FROM json_temp_table

Is there a way to make json-generated tables insert nulls into temp tables ?

The weird part is the select statement works fine without the insert. Also, temp tables by themselves can definitely handle null int values. I don’t understand why those two don’t mix.

Principled SSS values for skin material

I had some difficulty finding information about the usage of the Subsurface Scattering parameters in the Principled BSDF node for a skin material, so here are some questions I have that are still unanswered despite my efforts:

  • Is the Subsurface parameter value always at 1.000 for a skin material? If not, does it need to be mapped with a texture?
  • Would the Subsurface Radius parameter be constant across the face of a real person? If not, how can I modify this value for different parts of the skin, given it accepts a Vector input?
  • Is the Subsurface Color intended to have a subdermal map as it’s input?
  • Is there a document that lists parts of the skin and common SSS values for them?

I’m working with a cartoonized character and I think introducing more realism to its skin can give me better results, but I still don’t need to be very accurate.

If digital values are mere estimates, why not return to analog for AI?

The impetus behind the twentieth century transition from analog to digital circuitry was driven by the desire for greater accuracy and lower noise. Now we are developing software where results are approximate and noise has positive value.

  • In artificial networks, we use gradients (Jacobian) or second degree models (Hessian) to estimate next steps in a convergent algorithm and define acceptable levels of inaccuracy and doubt.1
  • In convergence strategies, we deliberately add noise by injecting random or pseudo random perturbations to improve reliability by essentially jumping out local minima in the optimization surface during convergence.2

What we accept and deliberately introduce in current AI systems are the same things that drove electronics to digital circuitry.

Why not return to analog circuitry for neural nets and implement them with operational amplifier matrices instead of matrices of digital signal processing elements?

The values of artificial network learning parameters can maintained using integrated capacitors charged via D-to-A converters such that the learned states can benefit from digital accuracy and convenience, while forward propagation benefits from analog advantages.

  • Greater speed3
  • Orders of magnitude fewer transistors to represent network cells
  • Natural thermal noise4

An academic article or patent search for analog artificial networks reveals much work over the last forty years, and the research trend has been maintained. Computational analog circuits are well developed and provide a basis for neural arrays.

Could the current obsession with digital computation be clouding the common view of AI architectural options?

Is hybrid analog the superior architecture for artificial networks?

 


Footnotes

[1] The PAC (probably approximately correct) Learning Framework relates acceptable error $epsilon$ and acceptable doubt $delta$ to the sample size required for learning for specific model types. (Note that $1 – epsilon$ represents accuracy and $1 – delta$ represents confidence in this framework.)

[2] Stochastic gradient descent is shown, when appropriate strategies and hyper-parameters are used, to converge more quickly during learning and is becoming a best practice in typical real world applications of artificial networks.

[3] Intel Core i9-7960X Processor runs at turbo speeds of 4.2 GHz whereas the standard fixed-satelite broadcasting is 41 GHz.

[4] Thermal noise can be obtained on silicon by amplifying and filtering electron leakage across a reverse biased zener diodes at its avalanche point. The source of the quantum phenomena is Johnson–Nyquist thermal noise. Sanguinetti et. al. state in their ‘Quantum Random Number Generation on a Mobile Phone’ (2014), “A detector can be modeled as a lossy channel with a transmission probability η followed by a photon-to-electron converter with unit efficiency … measured distribution will be the combination of quantum uncertainty and technical noise,” and there’s CalTech’s JTWPA work. Both of these may become standards for producing truly nondeterministic quantum noise in integrated circuits.

References

  • STDP Learning of Image Patches with Convolutional Spiking Neural Networks, Saunders et. al. 2018, U Mass and HAS
  • General-Purpose Code Acceleration with Limited-Precision Analog Computation, Amant et. al., 2014
  • Analog computing and biological simulations get a boost from new MIT compiler, by Devin Coldewey, 2016
  • Analog computing returns, by Larry Hardesty, 2016*
  • Why Analog Computation?, NSA Declassified Document
  • Back to analog computing: Columbia researchers merge analog and digital computing on a single chip, Columbia U, 2016
  • Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing, Zidan et. al., IEEE, 2017
  • FPAA/Memristor Hybrid Computing Infrastructure, Laiho et. al., IEEE, 2015
  • Foundations and Emerging Paradigms for Computing in Living Cells, Ma, Perli, Lu, Harvard U, 2016
  • A Flexible Model of a CMOS Field Programmable Transistor Array Targeted for Hardware Evolution (FPAA), by Zebulum, Stoica, Keymeulen, NASA/JPL, 2000
  • Custom Linear Array Incorporates Up To 48 Precision Op Amps Per Chip, Ashok Bindra, 2001, Electronics Design
  • Large-Scale Field-Programmable Analog Arrays for
    Analog Signal Processing
    , Hall et. al., IEEE Transactions on Circuits and Systems, vol. 52, no. 11, 2005
  • Large-scale field-programmable analog arrays for analog signal processing, Hall et. al. 2005
  • A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity, Indiveri G, Chicca E, Douglas RJ, 2006
  • https://www.amazon.com/Analog-Computing-Ulmann/dp/3486728970
  • https://www.amazon.com/Neural-Networks-Analog-Computation-Theoretical/dp/0817639497

too many nominal values in my dataset

I intent to make an AI that can predict if a patient is allergic to a medicine.
In this database I have a table medicine with values “id” “name” and “component_id”, component is another table that has “id” and “name”
and allergy that has “component_id” and “patient_id”
I wish to make a query like this
“patient_id” “patient_document” “patient_name” “medicine” and “component_id” and That will retrieve what medicine does a patient have an allergy.
the issue is the following.
How can I work this database if medicine and component and patient are not categorical data. there might be a thousand patients, a thousand medicines and a thousand components.
Note that I’m working with the Id’s as “categorical” (nominal?) data.

I’m rather new to machine learning, and I haven’t found yet a method to solve this issue. They use mostly fixed data as categorical data, like gender, marital status, etc. and I know that the correct practice in some scenarios is to make dummy columns. but I might have a thousand columns, that could potentially escalate out control, and it doesn’t seem to be a good practice in this scenario.
How should I approach this?

I’m using Python as my code tool.

Thanks!

how to I replace numeric values with a string in an R dataframe?

I want to replace all numeric values in a column in my data frame with a string value. The following doesn’t seem to work.

df <- within(df, myCol[is.numeric(myCol)] <- 'NOTMISSING')

Even though the df has some values as NA and others as numbers, all values are being replaced with NOTMISSING.

Also tried

df <- within(df, myCol[is_numeric(myCol)] <- 'NOTMISSING')

Any pointers highly appreciated.

Treating a column containing null values for random forest when they should be null (not missing)

Suppose I have columns like this

 date_ago happened_or_not
0  3.0  1
1  1.0  1
2  NaN  0
3  NaN  0
4  3.0  1
5  5.0  1
6  NaN  0
7  NaN  0
8  2.0  1

Now the first column contains null values when they should be null, how do I treat this column and eventually pass it into RandomForestClassifier? Any idea is appreciated!

I tried to search for this but everything is concerning when the data is actually missing, where as in this case it’s not missing but rather just null and unable to pass into RandomForestClassifier.

Multiple values in list

I am using SharePoint from Office365.
Two lists are given:

List 1:
1 | Person1
2 | Person2
3 | Person3
...



List 2:
1 | Activity1 | Person1;Person3
2 | Activity2 | Person2
3 | Activity3 | Person1;Person2;Person3
4 | Activity4 | Person2;Person3

I want to add another column to list 1 that lists all activites where the person from the person column is included in the third column of list 2. So:

1 | Person1 | Activity1;Activity3
2 | Person2 | Activity2;Activity3;Activity4
3 | Person3 | Activity1;Activity3;Activity4

Thanks for helping.

Lexinas