The R package ‘modelIntegration’ implements aggregation of several probability distributions into a single integrated one. Suppose that, several independent methods are used to observe a deterministic element and each method represents the latter as a probability distribution. Thus, we deal with a family of probability distributions providing alternative descriptions to the same object. The problem is how to combine information from the prior estimates. This package implements the posterior integration method [Kryazhimskiy, 2013]. For comparison, an implementation of simple averaging of the input distributions is added.

where \(p_1,p_2,\dots,p_n\) are prior distributions on \(Z\) associated with the methods \(1,\dots,n\). \(Z\) is a non-empty finite set, whose number of elements is bigger than one.

Alternatively, prior estimates can be combined using simpleTo explore the basic usage of modelIntegration, we’ll start with the built-in `forest_npp`

and `forest_npp90`

data frames. These datasets contain probability distribution tables for net primary production (NPP) of the forest ecosystems in seven bioclimatic zones in Russia, reported in [Kryazhimskiy et al., 2015]. The documentantation of the datasets is provided with `?forest_npp`

and `?forest_npp90`

calls.

`dim(forest_npp)`

`## [1] 1131 17`

`colnames(forest_npp)`

`## [1] "npp" "LEA_Tundra" ## [3] "LEA_Tundra_Northern_Taiga" "LEA_Middle_Taiga" ## [5] "LEA_Southern_Taiga" "LEA_Temperate" ## [7] "LEA_Steppe" "LEA_Deserts" ## [9] "LEA_Total" "DGVM_Tundra" ## [11] "DGVM_Tundra_Northern_Taiga" "DGVM_Middle_Taiga" ## [13] "DGVM_Southern_Taiga" "DGVM_Temperate" ## [15] "DGVM_Steppe" "DGVM_Deserts" ## [17] "DGVM_Total"`

The main method of the modelIntegration package is `integrate`

. It can work with several representations of probability distributions. The discrete distributions are supplied through `pdfs`

argument, which supports a ‘table-based’ format. A continuous distribution is discretized using the cdf, supplied in `cdfs`

. In this case, a bin center equals to a value of the corresponding outcome and a bin width is determined from the subsequent outcome values in the range. The identical range of the random variables (associated with each prior distribution) is set in the `vals`

argument.

`example1 <- integrate( vals = forest_npp[, 1], pdfs = as.list(forest_npp[c("LEA_Tundra", "DGVM_Tundra")])) summary(example1)`

`## Product Average ## mean 189.29034 213.6184 ## std 42.78502 74.0616`

`example2 <- integrate( vals = forest_npp90[, 1], pdfs = as.list(forest_npp90["LEA_Tundra"]), cdfs = list("DGVM_Tundra" = function(x)(pnorm(x, mean = 202, sd = 52)))) summary(example2)`

`## Product Average ## mean 183.73562 212.92005 ## std 43.87124 79.16872`

The two integrated estimates can be accessed with `product`

and `average`

calls correspondingly. The package also supports a summary of descriptive statistics for the integrated distributions and the priors.

`example <- integrate(c(1, 2), list(c(0.75, 0.25), c(0.75, 0.25))) product(example)`

`## x prob ## 1 1 0.9 ## 2 2 0.1`

`average(example)`

`## x prob ## 1 1 0.75 ## 2 2 0.25`

`statistics(example)`

`## P1 P2 Product Average ## mean 1.2500000 1.2500000 1.1 1.2500000 ## std 0.4330127 0.4330127 0.3 0.4330127`

[1] Kryazhimskiy, A.V. (2013). Posterior integration of independent stochastic estimates. IIASA Interim Report. IR-13-006.

[2] Kryazhimskiy, A.V. (2016). Posteriori integration of probabilities. Elementary theory. Theory of Probability and its Applications, 60(1): 62-87.

[3] Kryazhimskiy, A., Rovenskaya, E., Shvidenko, A., Gusti, M. Shchepashchenko, D. & Veshchinskaya, V. (2015). Towards harmonizing competing models: Russian forests’ net primary production case study. Technological Forecasting & Social Change, 98: 245-254.