Updated Verhulst model

by **pup55** » Tue 21 Jun 2005, 21:57:30

constrained the fit in some other way

In my spreadsheet, I have an "error term" which is, I take the difference between the model and the actual production for 1900-2008 find the absolute value, and then the sum for all years, so instead of RMS, this is the "gross error".

Using MS-excel "solver" set constraints: actual 2008 to equal predicted 2008, and total cumulative production 1900-2008 to equal predicted cumulative production, just like you said.

Using the above constraints, I minimized the "sum of absolute value of annual errors" to arrive at the above values for n, k, and t-50. I allowed the Q-inf to "float" but it did not float.

If you PM me with your "real" email address, in the morning, I will send you my spreadsheet and you can see what I did.

I think there is a mistake

I was never really able to get equation (6) to work, so this does not surprise me. We should probably email Roper and let him know.

by **Shiraz** » Tue 21 Jun 2005, 22:31:57

Using the above constraints, I minimized the "sum of absolute value of annual errors" to arrive at the above values for n, k, and t-50. I allowed the Q-inf to "float" but it did not float.

Ok, this explains it. It is traditional to use the sqaure of the difference rather than the absolute value of the difference. This is to ensure that the resulting fit pays more attention to the hardest-to-explain data, as would be appropriate.

If you plot up at home the models generated by each of our parameters, you will note that your model exhibits an *extremely* accurate fit between 1930 and 1965, and this allows the curve to be a worse fit at other places, and still be a best fit. With least squares regression, being a perfect fit at any data point is of less value, so you will note that the curve generated by my parameters will be a slightly worse fit over the 1930-1965 period, but a better fit at most of the other time periods. Also, I think you'll find Roper's methodology explicitly requires least squares regression.

by **khebab** » Tue 21 Jun 2005, 23:18:47

Here, a few properties of the Generalized Verhulst curve:

I tried to compute the inflexion point positions using Maple, but the resulting formulas are quite heavy!

by **khebab** » Wed 22 Jun 2005, 01:00:05

I used the nonlinear regression functions of Matlab statistical toolbox to perform the fit (function nlinfit). The confidence interval (95%) on the parameters is also estimated (function nlparci) as well the confidence interval on the predicted curve (function nlpredci). I considered two cases:
1- a fit using all the data from 1900 to 2004
2- a fit using all the data from 1900 to 1966 and from 1982 to 2004 (I excluded the data from 1967 to 1981 considered as outliers).
For both cases, I fix qinf at 2461.82 Gb (BP) and 4173 Gb (USGS).

Case 1:

Code: Select all: Sum of Square Error= 151 qinf= 2461.82 Gb: PO date= 1998.58+/- 1.41 PO value= 25.96+/- 0.38 Gb/year k= 0.06+/- 0.00 n= 2.25+/- 0.42 t_half= 107.43+/- 1.07 Sum of Square Error= 141.932 qinf= 4173 Gb: PO date= 2003.89+/- 2.13 PO value= 26.35+/- 0.57 Gb/year k= 0.06+/- 0.00 n= 6.22+/- 0.89 t_half= 143.30+/- 2.56

Figure 8

Case 2:

Code: Select all: Sum of Square Error= 22.5 qinf= 2461.82 Gb: PO date= 2006.65+/- 1.05 PO value= 27.37+/- 0.45 Gb/year k= 0.05+/- 0.00 n= 1.26+/- 0.19 t_half= 108.76+/- 0.48 Sum of Square Error= 19.35 qinf= 4173 Gb: PO date= 2014.15+/- 1.62 PO value= 28.47+/- 0.61 Gb/year k= 0.05+/- 0.00 n= 3.98+/- 0.40 t_half= 139.92+/- 1.22

Figure 9
Case 2 produces a better estimation (SSE smaller) with a peak in 2006+/- 1 year and once again the increase in Qinf (from BP to USGS) do not produce big difference in maximum production rate (from 27.37 to 28.47 Gb).

by **Shiraz** » Wed 22 Jun 2005, 02:00:06

Thankyou Khebab.

If I may offer a number of criticisms...

Case 2 produces a better estimation (SSE smaller)

!!! Yes, you excluded outliers. This does not constitute a reason to prefer the model because you have excluded the very data causing high error components. Further, as you do not appear to have taken the mean of the errors, then in case 1, you are actually adding more total error terms. I don't know if you meant to imply that because this curve had less SSE then it had better predictive validity, but if so, I suggest you remove all but two data points, and voila, zero error... if you get my drift.

Secondly, thankyou for these graphs, because they demonstrate to some extent the point I was trying to make earlier - that you can't make anything of the shape of the right side of the curve. If you want to see what I mean by this, run a model 3 with a Qinf of something ridiculous, like 10 gig (or even 100 gig) or so. You should find that the peak date will still be quite soon, but the predicted future curve is extremely flat, production almost constant for the next 50 years. Of course, if Qinf were 10 gig, we would *not* expect that. Rather, because of the statistical implications of summing semi-independant regions, we'd expect a more symmetrical curve than best fit of verhulst will produce.

Better yet. Take the data from 1900 to 1965 only. Fix Qinf at 2461, and look at the shape of the best fit verhulst. See how it flattens? This is what I mean about the meaninglessness of the right side of the curve.

Sorry if I seem like a bit of a pitbull on this. I really just want to help.

As I said before, I'm not averse to verhulst modelling of oil production. But I do think it needs to be theoretically driven. What I mean is that we need to say, before we start, that we think the curve is, at best expectation, approximately symmetrical (as Hubbert did), and use that theoretical inference to guide modelling (hold n constant). I do think that, if you don't take symmetry as a guide, verhulst modelling becomes next to useless (even though it is obvious for some regions that you can get *much* better fits to the data if you let n vary eg. Ukraine).

by **khebab** » Wed 22 Jun 2005, 12:13:01

Shiraz wrote:!!! Yes, you excluded outliers. This does not constitute a reason to prefer the model because you have excluded the very data causing high error components. Further, as you do not appear to have taken the mean of the errors, then in case 1, you are actually adding more total error terms. I don't know if you meant to imply that because this curve had less SSE then it had better predictive validity, but if so, I suggest you remove all but two data points, and voila, zero error... if you get my drift.

I think we have to put the Least-Square Method (LSM) in a larger context. A LSE estimation is equivalent to a maximum likelihood estimation assuming a gaussian distribution for the noise (i.e. distribution of the regression residuals).

Question: are the residuals distributed according to a gaussian pdf?

Figure 10
The residual distribution is clearly non gaussian therefore LSM is clearly not the best estimation technique. Robust LS algorithms exist which are basically weighting the data according to the current residual dispersion. The reweighted least-square for instance is an iterative method which exists in Matlab (function robustfit). In the case 2 above, I basically imposed a 0 weight for the period 1967-1981 and 1 elsewhere.

Shiraz wrote:Secondly, thankyou for these graphs, because they demonstrate to some extent the point I was trying to make earlier - that you can't make anything of the shape of the right side of the curve. If you want to see what I mean by this, run a model 3 with a Qinf of something ridiculous, like 10 gig (or even 100 gig) or so. You should find that the peak date will still be quite soon, but the predicted future curve is extremely flat, production almost constant for the next 50 years. Of course, if Qinf were 10 gig, we would *not* expect that. Rather, because of the statistical implications of summing semi-independant regions, we'd expect a more symmetrical curve than best fit of verhulst will produce.

The lower value of Qinf for which nlinfit gives a valid curve was Qinf= 1475Gb. Below that value it gets very instable (because the right side slope is approaching -infinity).

Figure 11

Shiraz wrote:Better yet. Take the data from 1900 to 1965 only. Fix Qinf at 2461, and look at the shape of the best fit verhulst. See how it flattens? This is what I mean about the meaninglessness of the right side of the curve.

Here we go!

Figure 12
Good point, clearly this model is biaised toward the least depleted curve. If a small portion of Qinf has been extracted, a slow depletion will come up.

Shiraz wrote:As I said before, I'm not averse to verhulst modelling of oil production. But I do think it needs to be theoretically driven. What I mean is that we need to say, before we start, that we think the curve is, at best expectation, approximately symmetrical (as Hubbert did), and use that theoretical inference to guide modelling (hold n constant). I do think that, if you don't take symmetry as a guide, verhulst modelling becomes next to useless (even though it is obvious for some regions that you can get *much* better fits to the data if you let n vary eg. Ukraine).

I agree, but I'm just not sure that symmetry is a realistic constraint if we look at depletion for individual countries ( http://www.dieoff.org/42Countries/42Countries.htm).

by **Shiraz** » Thu 23 Jun 2005, 01:44:31

I wrote:If you want to see what I mean by this, run a model 3 with a Qinf of something ridiculous, like 10 gig (or even 100 gig) or so.

Blush... Ooops

Actually meant 10 thousand gig or 100 thousand gig.

Nevermind, luckily the next bit (1900..1965 data) shows the same point...

Also...

Khebab wrote:I agree, but I'm just not sure that symmetry is a realistic constraint if we look at depletion for individual countries ( http://www.dieoff.org/42Countries/42Countries.htm).

Well yes, this is exactly what I'm getting at. Which will be closer to a symmetrical curve - the curve for any one of those 42 countries, or, the curve for any single producing field or region within one of those countries? Of course we cannot say for sure, but it is more likely that summing a sample of producing fields will be more symmetrical curve than any single field.

Now of course, looking at those 42 country curves it's easy to say to yourself, well, however strong the effect is, it's clearly not strong enough. Well, that's where we're lucky. We want to look at global production. As earlier, I suggest that it must be likely that the sum of production from those 42 countries should be more symmetrical than the curves of production for those countries typically are individually.

The biggest thing complicating this is global political disturbances, of course, and I guess this is a whole lot more discussion. Can political disturbances be viewed merely as the proximate effect of a global production which has deviated too far from the symmetrical mean, for example? Or are they truly exogenous events, meaning that our expection should be for a symmetrical curve only from now to future, and that any further exogenous political disturbance will 'shock' the future curve away from symmetry? And what about time dependent variables, that effect all regions, like technology? Do they reasonably justify a shift in expectation away from symmetry?

I don't definitive answers to these questions, but the point i'm mainly trying to make is that an assumption of symmetry is the most likely to produce the best predictions. In the case of US lower 48, the assumption of symmetry worked very well for Hubbert, and this is often attributed to the large number of small producers working in a free market. No other region fills these criteria like lower 48. But, what about the world as a whole? There's a great variety of local conditions in which oil is extracted, and the notion that some 90% of global oil reserves are controlled by governments doesn't bode well for the argument from analogy. But if forces coming from government were partially independent (deriving from local political necessities, for example), would this perhaps be enough to justify the expection that summing the individual production curves should yield a marginally more symmetrical curve in summation? In any case, I think that where a verhulst fit suggests a deviation from symmetry in future production, that this is quite arbitrary and not to be taken at face value...

by **EnergySpin** » Mon 04 Jul 2005, 02:19:29

Hello everyone,
I just spent the whole day playing with the Verhulst model (as detailed by Roper but after correcting his Equation 6 due to the typographical error). I got the data from a previous post in the thread but censored everything after 2005). I post under a previous thread today, what I think is the problem not only with the data BUT we the models used to make predictions based on the data. Let me add my 2c ...
Actual Measurements
It was pointed out by others, that the actual measurements have a couple of "humps" (i.e. in the 60s). It is due to these "outliers" that alternative models to the Verhlqust were proposed (i.e. the Riccati stochastic differential equation). The Verhust model DOES not provide for such data (i.e. it assumes that the whole depletion process is rather homogenous is time). Hence if we want to use it, then the only sensible way is to "smooth" these humps out. We will be using an approximate model but it is a start. Hence I decided to run a comparison between how the Verhulst model fares when both the "original" and the "smoothed" data were used
Data Smoothing
I used a model free smoother specifically the LOWESS (LOcally WEighted Smoother Scatterplot) of William Cleveland Journal American Statistical Association 74 (368) 829-836 1979, which performs local linear regressions at every point based on a neighboorhood of size f. The result of this exercise is a smoothed version of the original dataset.
For completenes, I list the two datasets (original-smoothed) so that people can replicate the results (more about that latter). I used the implementation of the smoother found in the (freely available R language)
http://www.r-project.org, with f=0.22 (a typical value of the neighboorhood parameter suggested by Cleveland)

Original
1900 0.20
1901 0.24
1902 0.27
1903 0.31
1904 0.35
1905 0.38
1906 0.42
1907 0.45
1908 0.49
1909 0.53
1910 0.56
1911 0.60
1912 0.64
1913 0.67
1914 0.71
1915 0.75
1916 0.78
1917 0.82
1918 0.85
1919 0.89
1920 0.93
1921 0.96
1922 1.00
1923 1.00
1924 1.29
1925 1.57
1926 1.86
1927 2.14
1928 2.43
1929 2.71
1930 3.00
1931 3.20
1932 3.40
1933 3.60
1934 3.80
1935 4.00
1936 4.20
1937 4.40
1938 4.60
1939 4.80
1940 5.00
1941 5.20
1942 5.40
1943 5.60
1944 5.80
1945 6.00
1946 6.20
1947 6.40
1948 6.60
1949 6.80
1950 7.00
1951 7.30
1952 7.60
1953 7.90
1954 8.20
1955 8.50
1956 8.80
1957 9.10
1958 9.40
1959 9.70
1960 10.00
1961 10.32
1962 10.64
1963 10.96
1964 11.29
1965 11.61
1966 12.62
1967 13.55
1968 14.76
1969 15.93
1970 17.54
1971 18.56
1972 19.59
1973 21.34
1974 21.40
1975 20.38
1976 22.05
1977 22.89
1978 23.12
1979 24.11
1980 22.98
1981 21.73
1982 20.91
1983 20.66
1984 21.05
1985 20.98
1986 22.07
1987 22.19
1988 23.05
1989 23.38
1990 23.90
1991 23.83
1992 24.01
1993 24.11
1994 24.50
1995 24.86
1996 25.51
1997 26.34
1998 26.86
1999 26.40
2000 27.36
2001 27.31
2002 27.17
2003 28.12
2004 29.29
2005 29.83
Smoothed
1900. 0.2
1901. 0.2
1902. 0.3
1903. 0.3
1904. 0.3
1905. 0.4
1906. 0.4
1907. 0.5
1908. 0.5
1909. 0.5
1910. 0.6
1911. 0.6
1912. 0.6
1913. 0.7
1914. 0.7
1915. 0.7
1916. 0.8
1917. 0.8
1918. 0.9
1919. 1.
1920. 1.1
1921. 1.3
1922. 1.4
1923. 1.6
1924. 1.7
1925. 1.9
1926. 2.1
1927. 2.2
1928. 2.4
1929. 2.7
1930. 2.9
1931. 3.1
1932. 3.3
1933. 3.6
1934. 3.8
1935. 4.
1936. 4.2
1937. 4.4
1938. 4.6
1939. 4.8
1940. 5.
1941. 5.2
1942. 5.4
1943. 5.6
1944. 5.8
1945. 6.
1946. 6.2
1947. 6.5
1948. 6.7
1949. 6.9
1950. 7.2
1951. 7.4
1952. 7.7
1953. 8.
1954. 8.2
1955. 8.5
1956. 8.8
1957. 9.1
1958. 9.4
1959. 9.8
1960. 10.3
1961. 11.
1962. 11.7
1963. 12.3
1964. 12.9
1965. 13.4
1966. 14.
1967. 14.6
1968. 15.2
1969. 15.9
1970. 16.6
1971. 16.9
1972. 17.3
1973. 17.8
1974. 18.3
1975. 18.8
1976. 19.3
1977. 19.8
1978. 20.3
1979. 21.1
1980. 21.5
1981. 21.7
1982. 21.9
1983. 22.1
1984. 22.3
1985. 22.6
1986. 22.8
1987. 23.
1988. 23.2
1989. 23.4
1990. 23.7
1991. 23.9
1992. 24.3
1993. 24.6
1994. 24.9
1995. 25.3
1996. 25.6
1997. 26.
1998. 26.3
1999. 26.6
2000. 27.
2001. 27.3
2002. 27.7
2003. 28.
2004. 28.4
2005. 28.8

Software and Fitting Method
As I said, I used the Verhulst equation which was given by:
((-1 + Power(2,n))*Qinf)/
(Power(E,thl/τ)*Power(1 +
(-1 + Power(2,n))/Power(E,thl/τ),(1 + n)/n)*n*τ
)
(Sorry guys, I wish I could upload the image from Mathematica)
Fittings were done using the NonlinearRegress algorithm of Mathematica 4.2 . I used Mathematica because of the following:
- ability to do fixed precision arithmetic (the precision of the measurements is finite btw, 16bit would be unrealistic)
- ability to code for the derivatives of Verhulst in a symbolic manner (this had a BIG impact)
- ability to control the numerical tolerance of matrix computations AND control the number of iterations/convergence criteria.
I do not trust Excel for stuff like that, and Wolfram Research has extensively tested the package for years. In addition I have published at least two papers using this implementation of Levenberg Marquadt and know how it works
Results
Well here comes the surprise ... the results of the fittings are ALL OVER the place and to make things worse, there are many plausible models that can give a wide range of predictions, both for the timing of the Peak as well as the URR, and the rate of depletion. It makes a HUGE difference how long you let the algorithm run (which it was somewhat suprising, the models I have used in the past did not have that sensitivity) and the CI of the resulting curves is also wide. In addition smoothing the data results in predictions with better behaviour across the whole range of data and not just the last few data points. In fact when one overlays the curve estimated on the smooth data, with the original data, there is a pretty good agreement from 1900 till 2004. The years from 60-65 have not invalidated the fit. In addition convergence of Levenberg Marquadt when applied to the smoothed data takes half the iterations compared to the original data.
I am listing the results , including not only Rsquares (which they do not make much sense in non linear regression) but also the 3 curvature measures of non-linearity of Watts. Just eyeballing the data ... it seems that the peak will likey be around 2008-2009. However due to the sensitivity of the results to the intrinsic control parameter of the algorithms, one should take this (as well as other predictions with (many!) grains of salt. What is also fascinating is that the estimate of the URR seems to be sensitive to control values of the algorithm even though the corresponding 99.99% CI are very narrow (not shown). I wonder if we ever find out the URR ... maybe when we have drawn the last drop of the stupid fluid from the Earth. If anyone wants to, PM to send the Mathematica Notebook (which also graphs the 99.99% CI for the predictions)
The results-Fit using smoothed data
Iter Rsq Qinf Peak MaxI ParamsEf 99.99%CI
18 0.993014 1736.22 1999.39 0.15 3.01 0.45
19 0.994286 2019.7 2000.8 0.13 3.88 0.45
20 0.995333 2303.44 2001.86 0.12 4.46 0.45
21 0.995775 2870.82 2004.24 0.11 5.96 0.45
22 0.996581 3435.62 2005.93 0.09 6.86 0.45
23 0.996675 4562.66 2009.33 0.09 10.42 0.45
24 0.997133 5119.14 2010.57 0.08 11.03 0.45
The results - Fit using the Original Data
Iter Rsq Qinf Peak MaxI ParamsEf 99.99%CI
45 0.95501 1679.74 1993.21 0.3 6.92 0.45
46 0.976933 2630.73 2000.24 0.22 12.13 0.45
47 0.982572 3728.76 2002.94 0.19 15.54 0.45
48 0.983016 5636.9 2009.11 0.19 26.96 0.45

Interpretation
Comparing the two different datasets we see that using the original (i.e. non smoothed data) the estimate of the peak and the URR are kind of inaccurate (notice that URR is anywhere from 1670 to 5636 (setting the # of iterations to <45 or > 48 gives even wilder estimates). The smoothed dataset seems to give a narrower estimate for both the peak and the URR (which is what was expected). The two models seem to agree that around 2009 is when TSWHTF. However the Qinf (or URR) is what will determine the rate of descent ... and which of the various scenarios will materialize.
Now ... if I were to do a formal comparison ... I would use the Bayesian HPD intervals for the parameters ... but is getting kind of late and I have to go to work tomorrow.

by **EnergySpin** » Mon 04 Jul 2005, 02:22:01

Sorry for the Syntax ... was drinking Kahlua with Bailey's and a litl bit of Vodka and it took a toll on my brain ... I hope to kill my liver before the Peak :razz:

by **pup55** » Mon 04 Jul 2005, 09:18:47

the results of the fittings are ALL OVER the place and to make things worse, there are many plausible models that can give a wide range of predictions, both for the timing of the Peak as well as the URR, and the rate of depletion

Welcome, EnergySpin

I think we found this out when we were doing the original modelling and manually putting in the variables: a given set of historical data would give many different plausible "right hand side" predictions.

At the time, we just attributed it to user bias etc. but your work above confirms it may be due to the modelling method, e.g. verhulst function. However, I figure any of these modelling functions may do the same thing, if left to their just devices.

Interestingly,if you just plot the historical curve and then let MS-excel solver do the calculations without preassigning variables, sometimes, the model blows up. You have to put in assumptions of k and n that are "close to" the original function, as defined by you, which, again, adds some user bias. I think Excel just automates the trial and error process you do when you put in the variables manually.

I think Roper's choice of the verhulst function was based on his review of the depletion curves of various industrial minerals as found in his paper "Where have all the Metals Gone", in which he examined the curves of various industrial minerals that had already passed peak using various functions, and settled on the Verhulst curve because it fit the reality better than anything else he had tried. So, there was a practical reason for doing this.

I would ask one question, namely, what would happen to your calculations if you did not let Q-inf float with the rest of the variables? This would correspond to a situation in real life in which reserves were sufficiently transparent, and were widely agreed upon, etc. Maybe it would be possible in your software to put a confidence band on this value. I would assume this would give a more limited range of possible peak outcomes.

by **khebab** » Mon 04 Jul 2005, 10:49:55

Welcome EnergySpin ! nice job!

I didn't know that Mathematica had some fitting capabilities. I think the idea of smoothing data is a good one. The 70s bump do not influence the general trend and the data within that bump can be considered as outliers.

I will probably pm you to get your Mathematica file. The possibility of mixing a nonlinear fitting technique and the symbolic computation of the derivative to get the Jacobian is very interesting.

Well here comes the surprise ... the results of the fittings are ALL OVER the place and to make things worse, there are many plausible models that can give a wide range of predictions, both for the timing of the Peak as well as the URR, and the rate of depletion. It makes a HUGE difference how long you let the algorithm run (which it was somewhat suprising, the models I have used in the past did not have that sensitivity) and the CI of the resulting curves is also wide

It's not a surprising result as explained by pup55 and shiraz. We need to inject more prior information into the model. For instance, usually assumptions are made about the value of the URR which is constraining the solution curves to have the same area. It would be interesting to performed a new fit using a fixed URR (USGS, BP or ASPO).

What is also fascinating is that the estimate of the URR seems to be sensitive to control values of the algorithm even though the corresponding 99.99% CI are very narrow (not shown). I wonder if we ever find out the URR

The URR has a huge impact on the shape of the right side. You can probably find a bunch of solutions that fit perfectly the left side and have very different behavior on the right side depending on the value of the URR. A good solution could be to perform a Bayesian fit, as you suggested, by using a prior probability distribution for the URR that will constrain the URR to remain within a valid range. Another solution could be also to make assumptions about the expected depletion rate.

I also suggest you to use the data from 1900 to 1965 posted by Shiraz in the Riccati thread.

Very nice post, I'm looking forward to your next posts!

by **EnergySpin** » Mon 04 Jul 2005, 14:45:38

Hello guys,
Comments on a couple of points:

At the time, we just attributed it to user bias etc. but your work above confirms it may be due to the modelling method, e.g. verhulst function. However, I figure any of these modelling functions may do the same thing, if left to their just devices.

Actually it is even more complicated than that. The 2 imminent theoreticians on Nonlinear Regression were Bates and Watts who published extensively in the later 70s - 80s, before the era of Monte Carlo based techniques. BW derived elegant formulas by approximating the Hessian of the chi-square function (I do not want to sound pedantic, but the underlying theory is not well known) and gave measures to assess the practionioner in understanding the results (i.e. the curvature measures of Nonlinearity). There are two such measures i.e. the parameters effect and the intrinsic nonlinearity effect). The first measure (roughly) how much of the non-linear behaviour is attributed to the values of the parameter and the latter the intrinsic non-linearity of the model (i.e. how well the Hessian is approximated by the linear/quadratic approximation of the algorithm). With different data (i.e. try and re run your simulations by eliminating one or two observations) the prediction will be drastically different. Even letting the algorithm work an extra or a couple iterations throw us of the mark. Unfortunately this is a feature of the time series we are working with and much less a feature of the Verhulst model per se. Without smoothing .... the results in terms of reproducibility etc are even worse.
To complicate matters, estimation using Verhulst (at least in its current parametric form) will probably never fly (i.e. the estimated parameters are multicollinear). To illustrate, generate an artificial dataset based on the Verhulst, add random noise and pass it through your solver. You will find that even in the case of known parameters ... you end up with estimates of the various parameters that are off the mark (usually lower for the URR, and lower half lifes etc). In the artificial dataset cases, the curvature measures of nonlinearity are better behaved than the real world dataset, but there is room for improvement.
BTW DO NOT USE EXCEL - it has been proven again and again that it's statistical routines are unreliable (there was a big fuss a few years ago on the Statistical Literature about the pitfalls of the various statistical packages). Mathematica is immune but the inferences are more unstable if one :
- uses machine precision arithmetic
- uses high precision and accuracy in the data (i.e. in the artificial dataset case, using a higher accuracy in the artificial dataset resulted in catastrophic performance of the estimation procedure). What i do is fix the accuracy in the input data in the 2 or 3 digits and then use a much higher tolerance in the numerical calculations AND arbitrrary precision arithmetic AND symbolic derivatives
- let the system ran forever ... one needs to restrain the fitting process to ran for only a few iterations ... otherwise it tends to locate curves with unrealistic parameters (i.e. the best curve in terms of least squares gives a URR close to fifty thousands GBa !!!)
So to summarize:
1) The dataset we are dealing with us, is crappy. It is not just the quality of data, but also the fact that the raw data is not of the form Signal+N(0,s). The error term each year is the sum of the errors of the previous years hence one of the important assumptions of NLS (uncorrelated errors) is violated.
2) The Verhulst does not lend itself to a straightforward NLS exercise. Even when you generate data on known versions of the process, the estimation procedure is characterised by extremely high CMNL (curvature measures of nonlinearity) values and one should try and rectify that.
3) Possible rectifications:
a) reparametrization of the Verhulst (I think this is were the money is). This is addressed latter
b) variance stabilization transformations of the data (albeit my smoothing did stabilize variance and in the artificial datasets I used, the variance was pretty much stable
c) doing away with the Verhulst altogether, even though :
- it has a sound theoretical basis though on ecosystem biology dealing with utilization of finite resources
- the URR is a parameter of the equation. In other modeling approaches, one would probably have to integrate over the realizations of the process to get an idea of the URR)
d) Kalman filtering approaches or Markov Chain approaches using the current form of the Verhulst equation
e) working with the Stochastic Differential Version of the Verhulst and using Brownian motion tools to estimate the parameters ). A related thread would be to use the equations in Roper's paper that deal with variable extraction rates and then apply the Expectation Maximization algorithm to calculate the integrals of the depletion rate at each point.
If you think these points are worth while pursuning PM me to see if we can work something out (for the next couple of months I am horribly busy with my certification exams in Internal Medicine). It is a very interesting applied statistics problem on top of its obvious importance for all of us :roll:

A good solution could be to perform a Bayesian fit, as you suggested, by using a prior probability distribution for the URR that will constrain the URR to remain within a valid range. Another solution could be also to make assumptions about the expected depletion rate.

Valid points both of them ... The theory of nonlinear regression and Bayesian assessment of the results has been worked out by Bates and Watts. Using locally non-informative priors they derived formulas for the Posterior Distributions of the various parameters. These formulas are listed in the publications I am referencing at this email. When I tried to visualize them mentally (4D space!) the results were extremely complicated. Given the intrinsic multicollinearity of Verhulst based parameter estimation I do not think that further progress can be made by using the current parametrization. One way out would be to do the following exercises:
1) Expand the VF using a finite power series in the vicinity of n=1,2,3 ... j (Using Mapple or Mathematica) to get n out of the exponents. A low order series approximation can give results that are within 10% of the theoretical values for n 3-4 times the size of the point of expansion. Playing with artificial datasets ... the inference on the parameters seems to be improved, but the garden variety solver would be inadequate. I suspect that a 2 step procedure combining gradient descent and symbolic expansion at the estimated value of n would be necessary.
2) Reparametrize the Verhulst so that instead of Qinf, n, τ, t1/2 we estimate the terms Qinf/(n*τ), Exp[τ/t1/2], 1/τ, n. Then using the asymptotic formulas for the HPDs, Bayesian Integration, and the PDF transformation theorem extract the the probability density values of the parameters we have to use in our response to PO (i.e. Qinf, n, τ, t1/2)
3) Do a full Bayesian analysis with proper (not necessarily conjugate priors) inside BUGS or your favourite Monte Carlo software.
I do not know which ones (1)-(3) are best ... but even with an approach as simplistic as the Verhulst equation it is not going to be easy. Not to mention the next step which would be a Lotka-Volterra simulation of the effect of PO on global population and infrastructure (to see how many people have to be executed in the next ASPO newsletter :wink:

)

My 2 c on the whole business of depletion modeling:
1) we as a human species control the dynamics of depletion from now on. What is not mentioned in any of the sites/papers/posts I have read (or is not understood) is that all the parameters are conditional to the concept of business as usual (i.e. continuation of extraction etc). If we stop pumping oil tomorrow ... then irrespective of geology depletion will not continue and peak (as well as oil becomes irrelevant). Even the 1-2% depletion rate is not particularly relevant, because the figure is calculated on the basis of a "business as usual" modus operandi; the more relevant figure is the input to the transformation of energy infrastructure ACROSS the globe (and that figure can be increasing every year in the face of decreasing global consumption). Leaving the oil industry to market forces would be folly at this point. I'm particularly worried that the big players will try to do something foolish i.e. embark in a resource war and then we are hosed. This is why the collapse of the Vikings in Greenland is a relevant historical precedent (i.e failure to change one's ways in the face of changing external constraints)
2) the whole Odulvai business. I like Duncan's work ... simple but not simplistic as a guiding principle. There is an underlying assumption though, that it is business as usual and that no-one can effect things. In mathematical terms Duncan's energy per capita ratio is basically a composite measure calculated in terms of a Lotka-Volterra evolution law(predator = human, prey = energy). The Lotka-Volterra approximation is a highly non-linear equation, with many interesting regimes of stability between predator and prey, and as with the PO/Verhulst approach it assumes that predating continues even in the face of imminent disaster (i.e. an energy extraction peak rate). However if the predating ratio is decreased and alternate energy forms are found then a die-off is not unavoidable. Even if one has to use the "prime-food" to "create a secondary food source" (i.e. oil to fuel the transition to renewables) disaster is not guaranteed (in fact I dont know whether this has been studied, it would be a highly non-linear coupled system of Differential Equations).There are outcomes that are not dismal but hard political and personal choices have to be made (ie. move the paramaters of the equations in favourable subspaces). However this will never occur if fatalism prevails; when you are depressed, there can be no way out. And the bad thing is that people who are aware tend to see whatever they want to see and disregard the theory if it does not agree with their gloomy view. I believe that this is the worse hypocrisy (or maybe a human response?) Decide that nothing can be done NOW and create a fantasy world with concetration camps, semi-automatic riffles and nuclear depopulation. This might be plausible scenarios if nothing is done ... basic infrastructure guaranteeing survival of 6 bi people while descending is not energetically very expensive.A limit of energetically expensive personal choices needs take place (i.e. carbon quotas etc) but outside the market system. This is a war based mentality, but it is war and extinction we are facing. IT is the fat that has to go, particularly here in NA and then in EU.
3) The other danger ... is the use of mathematics to refute PO (i.e. home on the difficult mathematics to deny the existence of the phenomenon, or push it in the future). The burden falls on the technically/mathematically literate to defuse this bomb. It will not be long before an economist uses a NLS to create an unrealistic (but still mathematically plausible) fantasy world of 50000GBa to avoid the unavoidable deconstruction of the market system. It is either the market system or us .unfortunately (and I was a big believer till I delved into the physically implausible nature of the Cobbs Douglas production functions)
Anyway the technical references (anyone can PM if they want them in a digital format i.e. PDF/DJVU)
- R Dennis Cook, Miriam L Goldberg Curvature for Parameter Subsets in Nonlinear Regression Annals of Statistics 14(4):1399-1418, (1986)
- Douglas M Bates, Donald G .Watts Relative Curvature Measures of Nonlinearity, Journal of the Royal Statistical Society B 42(1): 1-25, (1980)
- Douglas M Bates, Donald G . Nonlinear Regression Analysis and its applications . John Wiley and Sons ISBN 0-471-81643-4 (1988)

by **khebab** » Mon 04 Jul 2005, 15:09:30

Whoa! there are a lot of good ideas in your post!

It is clear that the PO estimation problem is ill-posed and could benefit from the application of a more powerful Stochastic/Markovian/Bayesian framework (Monte-Carlo, Particle Filtering, etc.).

by **WebHubbleTelescope** » Mon 04 Jul 2005, 17:46:49

EnergySpin wrote: Results
Well here comes the surprise ... the results of the fittings are ALL OVER the place and to make things worse, there are many plausible models that can give a wide range of predictions, both for the timing of the Peak as well as the URR, and the rate of depletion. It makes a HUGE difference how long you let the algorithm run (which it was somewhat suprising, the models I have used in the past did not have that sensitivity) and the CI of the resulting curves is also wide.

When you say "let the algorithm run", I assume you have some sort of numerical integration going on. Non-linear differential equations are renowned for instability on the size of integration step.

On the other hand, I am seeing lots of references to closed-form equations in this discussion. These should not have that problem.

by **pup55** » Mon 04 Jul 2005, 17:51:28

I guess I would ask one more question:

Every article on PO has in it the sentence "M K. Hubbert, petroleum geologist, somehow used mathematical modeling/the central limit theorem to predict the peak of US oil production in 1970 and suggested that the world would peak in about y2K".

So we have spent some time and effort to figure out what this line of thinking was about, and come up with the general conclusions that use of the logistic curve is not an acceptable modeling techniqe, the data set is too rough to approximate the right hand side of the curve based on numerical analysis of the left hand side, and actual energy consumption going forward can be influenced by human-driven policy decisions, therefore it need not take the shape of any function.

So, what should we make of Hubbert? Did he just get lucky, or were the tools at his disposal sufficiently capable of predicting the future?

A second, related question: What to make of PO theory in general?

by **WebHubbleTelescope** » Mon 04 Jul 2005, 18:56:01

pup55 wrote:I guess I would ask one more question:

Simple really. He used intuition by looking at oil discoveries, noticed a peak in the lower 48 in the early 30's and extrapolated outward by calibrating against the finds that had already started depletng . Unless he came upon some bottomless pits it would come out about right.

One of the interesting properties of the simplest rate laws, i.e. when the rate of extraction is proportional to the amount remaining, is that it gives you the same half-life independent of the amount in the reservoir originally. That is why I think he could calibrate so well against the early data.

I think we have all seen graphs of the discovery curves and notice how they slide over top of the production curves.

by **EnergySpin** » Mon 04 Jul 2005, 19:17:54

When you say "let the algorithm run", I assume you have some sort of numerical integration going on. Non-linear differential equations are renowned for instability on the size of integration step.

Not really ... We all use formulas in closed form. However there is no closed form procedure for estimating the parameters that appear in these formulas. It is the estimation process that is non-linear; closed form solutions (and exact) formulas exist only for :
- linear regression
- polynomial regression (utilizing basis of orthogonal polynomials
- classes of log-linear models
- certain non-linear problems that can be re-parameterized in a linear form
For the problem we are considering we are applying various (closed form) formulas; however the estimation problem is very difficult, and at least some of the parameters of interest (i.e. URR) difficult to predict. The estimation process uses tools from non-linear optimization theory (things like the Newton/Gradient Descent/Levenberg-Marquadt by far the best). which are approximate, iterative algorithms, and at least in the Mathematica environment use a variable step selection procedure to approximate the solutions i.e. the values of the parameters to be estimated. I have to say that Verhulst is a tough cookie and in line with the NLS estimation field... one has to spent a lot of time working with it. BTW the estimation problem in the Michaelis Menten equation (another non-linear much simpler equation describing how enzymes work) generated close to a 100 papers in the last century.
Back to us, I think the peak will be sometime between 2007-2013, but the number we need for policymaking i.e. URR will not be known till the peak is fairly obvious (if you have data post peak ... then estimation becomes much easier). However by refining the approach to depletion modeling we have the option to generate and plan post-peak scenarios now since human consumption(Qinf, τ, n, t1/2 as well as geology (URR) control the shape of the curve

by **EnergySpin** » Mon 04 Jul 2005, 21:18:16

Excellent points, I will try and address them one by one iin the setting of PO and Odulvai's collapse

So we have spent some time and effort to figure out what this line of thinking was about, and come up with the general conclusions that use of the logistic curve is not an acceptable modeling techniqe, the data set is too rough to approximate the right hand side of the curve based on numerical analysis of the left hand side, and actual energy consumption going forward can be influenced by human-driven policy decisions, therefore it need not take the shape of any function....
A second, related question: What to make of PO theory in general?

There is a theory of depletion modelling, and Dr Roper has a web site
explaining some of the mathematical aspects of it (can be found at http://arts.bev.net/roperldavid/minerals/DepletTh.htm . The general theory was also worked out by Lotka in the 20s who was doing some of the early modeling of ecosystems and Volterra who was working on modeling of chemical reaction channels. The idea is that you have a finite reservoir with fixed carrying capacity (which measures the size of the reservoir) and a predatory species (chemical, biological or whatever) who uses the reservoir to grow. An integro-differential conservation law describes the transformation of one species to the other and can take different functional forms (this point is often overlooked).

The simplest mathematical system that can be built along these lines to describe the combined dynamics of the predator-prey (reactant-product, oil-humanity !) is the original Lotka-Volterra system which you can read about at: http://mathworld.wolfram.com/Lotka-Volt ... tions.html

Other possible mathematical formalisms that focus on either the prey or the predator include the Verhulst, logistic map, Gombertzian laws. These can be derived for different integro-differential conservation dynamics of the Lotka-Volterra system . As you can see from the Wolfram web site, changes in the dynamics of the prey lead (with a latency) to corresponding changes in the dynamics of the predator. This is where the Odulvai collapse derives its legitimacy it from, and this is why people make such bold statements that 5 year after the peak we will have the same energy we had 5 years before the peak etc. There are certain caveats in making those predictions and some of them are as non-transparent as the actual reserve data (hehe):
1) one needs to know the functional or parametric form of the predator - prey dynamics to make these predictions (it is possible 5 years post peak to have even less energy available than 5 years before the peak and vice versa)
2) the predator prey dynamics stay the same during build-up and collapse
3) the mathematical formalism is a faithful representation of the actual system during build-up and collapse.
Let me start by (3) because it is fascinating and also occurs in my research field (system biology/mathematical modeling of biomedical systems). These growth-depletion equations are macroscopic laws that collectively describe the average behaviour of the system. In particular:
A) the continuum assumption clearly does not hold for small human populations (i.e. try and take the limit to zero!) and at small numbers of the predator and prey, the system should be modelled using the Stochastic Differential Equation version which can display extremely more complicated dynamics
B) The dynamics stay the same throughout the cycle of oscillations of predator and prey. Changes in the parameters at any part of the cycle can influence the frequency of the oscillations, time to peak, depletion rate etc. They will not affect the carrying capacity (or the URR). This was fixed by geology. If there is any change in policy (or a new source of food) then the dynamics become extremely more complicated over the whole history of resource discovery and exploitation
I would like to digress and relate points (A) and (B) to civilization collapse (also explored by the "Limits to Growth"/"Club of Rome" World3 models)
Duncan and the Odulvais implicitly assume that long before the dynamics switch to the stochastic regime , depopulation and reversal of civilization will lead to an inability to use the remaining finite resources and us switching to a different life style (predating on the scraps afforded by agriculture in a devastated world). They go on to predicit that repeated attempts of civilization as population and expertise recovers will fail (akin to damped oscillation in the VL system) . All these statements are predicated on a "business as usual" model i.e. a world where the depletion rate constants stay the same (we will consume as much as we technically can!) and we will not switch to other energy forms. This is the most powerful political statement that we can derive from Duncan's theory. Business as usual leads to collapse but this can be tacjkled given the necessary political and personal will. The flipside is the media/economists positions of "Oh lets explore more". This microscopic (i.e. individual company behaviour) is faithfully reproduced by all physically plausible Lotka-Volterra systems. These models, represent the collective behaviour of many individual actors in search of resources/food in a finite world. More exploration will only work if we we are LONG BEFORE the peak. In the ascending part of the curve there is plenty more to be discovered. However notice that the exploration is limited by URR ... one cannot discover what does not exist , and an exploration frenzy will lead to an earlier peak and then decline. The peak is a inevitable consequence of the finitness of the resource AND the initial delay in utilizing it at a maximum rate. If a fairy refilled all our oil reservoirs overnight, the new dynamics would like an exponential decay curve akin to the discharge of a capacitor. The fairy would have saved the lag in the exploration, but alas no such fairy exists
However the rate of decline has to do with the consumption ... and we can affect that. We have a vested interest to slow it as much as possible so we can switch over to renewables (actually any form of energy that is derived from the sun). Oil should be used as a feedstock to renewables, while we reduce our consumption (a product of population x individual energy use). Once the transition has been made ... then oil becomes irrelevant. Since renewables are by definition renewables ... no oscillatory behaviour in the predator - prey system is possible and no collapse (unless we become prey to another predator i.e. a random epidemic). However growth,is also not possible unless our energy - prey becomes more abundant (this is the theory behind energy based currencies, that Steady State and Environmental economics and the Technocratists of the 30s were exploring). A fixed carrying capacity of renewables leads to a fixed relation between material standards of leaving and population. An infinite energy source will only make this relation a function of whatever is the next limiting resource. But no-one really knows of what is the capacity of the renewables - the theoretical potential is equal to the total input of solar radiation to this planet, but solar energy does other work as well, and we should not destroy the bioshere in our search for energy (our energy gathering devices from farms to solar panel arrays will have to compete with other living beeings for space for example)

Point number 2: Even when the differential equations are faithful representations of the actual dynamics (i.e. we do not have to switch to the stochastic regime), a change in the depletion rate constants can affect the overall shape of the curve. If during the history of the depletion changes in the parameters do happen, then many of our closed form formulas are gross approximations and a hybrid approach is necessary (Roper uses the term variable rate Verhulst). Note that this applies to both the future and the past and from the viewpoint of the VR formulation, there exist a world of opportunities for modeling. But from a political standpoint, these tools can only help us in guide policy and research towards renewables and AWAY from consumption that does not advance this goals. Obviously we cannot put ALL the resources to advance this goal because there will be no energy left to support human beings. But this is the role of politics and the citizen, to prioritize. Mathematically speaking research, politics and the citizen should strive to put a feedback term in the VL dynamics limiting consumption and divert resources to renewables (to build up our next pray). The free market is an ineffectual way of doing these, because it is slow and by definition does not use global knowledge which is needed to control the dynamics.

Point 1 is relevant to

So, what should we make of Hubbert? Did he just get lucky, or were the tools at his disposal sufficiently capable of predicting the future?

. I wrote yesterday and today about the estimation problem in the Verhulst formulation (and NLS in general). Hubbert did his research in the mid - 50s and he was affiliated with the Technocracy movement who seemed to comprehend the dynamics of the Lotka-Volterra system. He was also an astute scientist with theoretical and practical knowledge (no ivory tower philosopher) who looked at the data and made an educated guess. The main statistical tool for non-linear estimation was first published in 1948 by Marquadt in the Indiana Bulletin of Mathematical Sciences and was rediscovered in the 60s by Levenberg. Hubbert quite likely did not know about this research; he probably used a combination of manual curve fitting and maybe even a Newton algorithm if he had access to a computer (in the mid-50s that was a scarce resource). The lower 48-US data were also transparent (i.e. low noise) and had not "humps" which made for a relatively easy peak prediction (an element of luck was required but my impression is that in his first prediction he gave a date range for the peak). Hubbert also understood the nature of oil business and the nature of American/post-WWII economic mentality (exploitate till you drop dead). Hence his conservation law was described by a simple curve which he used .... No miracles there, just the expressive capacity of differential equations conservation laws and human greed. His prediction for the whole world used the estimated parameters from the US - which might have been a stretch, but US is a pretty big continent. with a rich geologic history. The reason we have not seen the world Peak he predicted yet (there are others who disagree) is because due to 80s recession the dynamics did change.
If people had listened to Hubbert and the Environmental movement in the 70s, things would have been different but it is up to us now to try and rectify the situation. It is going to be difficult ... and it must not be left to the free market. Mathematics can help focus our efforts, but they will not do the work for us. But they will predict the demise if we do not act NOW

by **khebab** » Mon 04 Jul 2005, 22:00:33

pup55 wrote:So, what should we make of Hubbert? Did he just get lucky, or were the tools at his disposal sufficiently capable of predicting the future?

Enhanced recovery techniques were not used at that time in the US, therefore the production curve had a good chance to be symmetric. Now, most fields use water or gas injection so the production curves are steeper (high kurtosis) and asymmetrical (skewness different of zero).

by **pup55** » Tue 05 Jul 2005, 06:47:10

Thanks, Khebab and Energyspin for improving the level of dialogue. I am just being Aristotilian.

http://peakoil.com/fortopic3218.html

You will enjoy the conversation of our Lotka-Volterra model as it applies to oil depletion.

Updated Verhulst model

Who is online