Verhulst, GBM, et al - Model Runs

by **Shiraz** » Tue 02 Aug 2005, 02:15:45

Sometimes there's nothing better that sitting down with a data set and pushing it around until it starts to yield it's truths. I've been playing with the oil production history data for a while and thought I'd share some insights, and a forecast. The whole thing has a fairly high level of dodginess about it, but I've noticed a couple of things that I haven't seen acknowledged anywhere else, so, if nothing else, I hope other people can get some new modelling inspiration from the new ideas.

The methodology will take a bit to explain, so bear with me (or just scroll down to the bottom for the 'meat'. Also, I'm just going to post data, so be sure you have your spreadsheet program open if you want little pictures to go with it.

Have you ever plotted a log of production history? I think you should. What does it look like? Interestingly, a rather simple yet striking behaviour is immediately apparent. This forecast proceeds by interpreting the Ln(Production) chart as a series of straight lines seperated by short periods of discontinuity. I fitted straight lines to each subset of the data that appeared to be approximately a straight line. The data, and straightline fits, follow.

Code: Select all: 1901 -1.79 -1.76 1902 -1.70 -1.70 1903 -1.64 -1.63 1904 -1.52 -1.57 1905 -1.54 -1.50 1906 -1.55 -1.43 1907 -1.33 -1.37 1908 -1.25 -1.30 1909 -1.21 -1.23 1910 -1.12 -1.17 1911 -1.07 -1.10 1912 -1.04 -1.03 1913 -0.95 -0.97 1914 -0.90 -0.90 1915 -0.84 -0.84 1916 -0.78 -0.77 1917 -0.69 -0.70 1918 -0.69 -0.64 1919 -0.59 -0.57 1920 -0.37 ---- -0.32 1921 -0.27 ---- -0.24 1922 -0.15 ---- -0.17 1923 0.02 ---- -0.09 1924 0.01 ---- -0.01 1925 0.07 ---- 0.07 1926 0.09 ---- 0.15 1927 0.23 ---- 0.23 1928 0.28 ---- 0.31 1929 0.40 ---- 0.38 1930 0.34 ---- ---- 1931 0.32 ---- ---- 1932 0.27 ---- ---- 0.27 1933 0.37 ---- ---- 0.35 1934 0.42 ---- ---- 0.43 1935 0.50 ---- ---- 0.52 1936 0.58 ---- ---- 0.60 1937 0.71 ---- ---- 0.69 1938 0.69 ---- ---- ---- 1939 0.74 ---- ---- ---- 1940 0.77 ---- ---- ---- 1941 0.80 ---- ---- ---- 1942 0.74 ---- ---- ---- 0.74 1943 0.81 ---- ---- ---- 0.82 1944 0.95 ---- ---- ---- 0.89 1945 0.95 ---- ---- ---- 0.96 1946 1.01 ---- ---- ---- 1.04 1947 1.11 ---- ---- ---- 1.11 1948 1.23 ---- ---- ---- 1.19 1949 1.22 ---- ---- ---- 1.26 1950 1.34 ---- ---- ---- 1.34 1951 1.45 ---- ---- ---- 1.41 1952 1.51 ---- ---- ---- 1.48 1953 1.57 ---- ---- ---- 1.56 1954 1.61 ---- ---- ---- 1.63 1955 1.73 ---- ---- ---- 1.71 1956 1.81 ---- ---- ---- 1.78 1957 1.86 ---- ---- ---- 1.86 1958 1.89 ---- ---- ---- 1.93 1959 1.96 ---- ---- ---- 2.00 1960 2.04 ---- ---- ---- 2.08 1961 2.10 ---- ---- ---- 2.15 1962 2.18 ---- ---- ---- 2.23 1963 2.26 ---- ---- ---- 2.30 1964 2.33 ---- ---- ---- 2.37 1965 2.45 ---- ---- ---- 2.45 1966 2.54 ---- ---- ---- 2.52 1967 2.61 ---- ---- ---- 2.60 1968 2.69 ---- ---- ---- 2.67 1969 2.77 ---- ---- ---- 2.75 1970 2.86 ---- ---- ---- 2.82 1971 2.92 ---- ---- ---- 2.89 1972 2.97 ---- ---- ---- 2.97 1973 3.06 ---- ---- ---- 3.04 1974 3.06 ---- ---- ---- ---- 1975 3.01 ---- ---- ---- ---- 3.04 1976 3.09 ---- ---- ---- ---- 3.07 1977 3.13 ---- ---- ---- ---- 3.11 1978 3.14 ---- ---- ---- ---- 3.15 1979 3.18 ---- ---- ---- ---- 3.19 1980 3.13 ---- ---- ---- ---- ---- 1981 3.08 ---- ---- ---- ---- ---- 1982 3.04 ---- ---- ---- ---- ---- 1983 3.03 ---- ---- ---- ---- ---- 3.04 1984 3.05 ---- ---- ---- ---- ---- 3.06 1985 3.04 ---- ---- ---- ---- ---- 3.07 1986 3.09 ---- ---- ---- ---- ---- 3.09 1987 3.10 ---- ---- ---- ---- ---- 3.10 1988 3.14 ---- ---- ---- ---- ---- 3.12 1989 3.15 ---- ---- ---- ---- ---- 3.13 1990 3.17 ---- ---- ---- ---- ---- 3.15 1991 3.17 ---- ---- ---- ---- ---- 3.16 1992 3.18 ---- ---- ---- ---- ---- 3.18 1993 3.18 ---- ---- ---- ---- ---- 3.19 1994 3.20 ---- ---- ---- ---- ---- 3.21 1995 3.21 ---- ---- ---- ---- ---- 3.22 1996 3.24 ---- ---- ---- ---- ---- 3.24 1997 3.27 ---- ---- ---- ---- ---- 3.25 1998 3.29 ---- ---- ---- ---- ---- 3.27 1999 3.27 ---- ---- ---- ---- ---- 3.29 2000 3.31 ---- ---- ---- ---- ---- 3.30 2001 3.31 ---- ---- ---- ---- ---- 3.32 2002 3.30 ---- ---- ---- ---- ---- 3.33 2003 3.34 ---- ---- ---- ---- ---- 3.35 2004 3.38 ---- ---- ---- ---- ---- 3.36

Ok, this is probably quite obvious to some, but lets take a brief moment to consider what this means. A straight line on a Log plot is exponential growth (or decay). Thus, interpreting the historical production data this way is equivalent to saying oil production has tended to grow according to a stable exponential trend until something disrupts that trend, at which time, after a short period of adjustment, production resumes growth according to some other exponential trend. The first thing that may jump to mind is that short term forecasts by IEA, EIA etc. always basically follow this principle - Exponential growth in consumption and fix production to match. Well, this methodology suggests that they will be correct, or correct until, at least, a significant discontinuity arrives. For those of us who have spent inordinate amounts of time trying to find stable methodology for finding verhulst fits, this suggests the reason we have had so much trouble at this. If the local behaviour of the curve is exponential, while the local behaviour of the fitting verhulst is relatively flat (concave), it is unsurprising that large variation in parameters yield such small variation in goodness of fit. Anyway, let's continue.

We now have 6 straight lines, and 5 periods of discontinuity. Each straight line can be defined by m and b such that y = mx + b. The lines are:

Code: Select all: m 0.066 0.079 0.084 0.074 0.038 0.015 b -1.831 -1.894 -2.429 -2.376 0.159 1.774

If you want, you can plot the exponential of these lines, and see how it maps on to ordinary production.

The five periods of discontinuity are:
circa 1919-192? (beginning of "Roaring twenties")
circa 1929-1932 (Great Depression)
circa 1937-1943 (WWII related)
circa 1973-1975 (First "Oil Crisis")
circa 1979-1983 (Second "Oil Crisis")

I think these periods of discontinuity are pretty apparent, and find it remarkable that they each map well to a significant geopolitical event. The exception is perhaps the first, which also happens to be the only positive shock. In seperate analysies I took the first shock to be 1919-1920 or 1919-1923. For the forecast below I actually used 1919-1923. Either way, it seems strange to me that WWI produces no obvious negative shock, and that the aftermath should produce such a positive shock, but I don't know my history that well. Oil production certainly seems to behave differently between the two world wars. Maybe someone knows why.

Anyway, we now have a framework from we would like to generate a forecast into the future. We suspect that oil production will continue along an exponential trend, until at some point in the future, when there is some discontinuity (probably big enough to be called a geopolitical shock), after which we may forecast that production will continue along some new exponential trend (until some further discontinuity). Several questions arise. How long until the next discontinuity? How large will the discontinuity be? What new exponential trend is likely to follow the shock? If we can answer these questions, we have a forecast that proceeds indefinitely into the future.

The method I have used to resolve these questions is rather shaky - believe me, I know how many long bows I'm balancing atop one another. If anyone has better ideas for how to resolve these questions, I'd love to hear it. Continuing, the method I used to resolve these questions follows.

Whatever method is used to generate future exponential trends, I suggest that it intrinsically make use of the fact that oil is a non-replenishable resource. This is the core logic behind conventional use of the verhulst curve, and it seems we may be able to make use of that now. I generated a verhulst curve, which becomes a comparison function, whereby the difference between the comparison function and actual production is taken to be related to the probability of shock. Follow? Production grows exponentially until it travels too far from a baseline verhulst curve. As it gets further away, the likelihood of a shock increases, and the shock will bring current production more into line with the comparsion verhulst curve. Ok, continuing.

I fitted a verhulst to historic data. Actually, I prefer to fit the Log of a Verhulst to the Log of historic data, because the latter exhibits relative homoskedasticity. This just means that the variance is pretty similar as we travel along the curve. It's not perfect in this case but it's better than fitting to the raw curve. I use Least Square Error mostly. When I did the fit, the following parameters were resolved.

U=1583
(1/k)=14.68
n=0.96
T1/2=1995

Looking at this fit, I believe we can see easily that it is inappropriate. If we look at the difference between the data and the fit, we can see that it is a it's greatest in year 2004. Logically, we can see that this is an artifact of fitting locally expontential data with a locally concave curve. This is the same artifact discussed earler. Thus, I rejected this fit as heavily biased.

Unsure of how to proceed, I chose to take U as an exogenous variable. Following Campbell and others, I used the ballpark 2000 Gb as U. The choice of U will significantly alter the outcome of the model. I subsequently fitted the remaining variables to log production - log verhulst, as before. The parameters were:

U=2000
(1/k)=14.53
n=1.84
T1/2=2003

Before continuing, note that U will no longer actually equal Ultimate production as in the standard verhulst model. Rather, it is merely a parameter of the comparison curve. The actual Ultimate will almost always be a fair amount greater than U, because the exponential growth trend will mostly trigger negative shocks by being greater than the comparison function (at least in the model that follows). Actual ultimate seems to be about 5-10% greater than U. I usually take the liberty of assuming that Campbell et al are 10-20% too pessimistic, so a U of 2000 is fine here with me (implies actual U = 2100-2200). I can also run this model with any other numbers, so make suggestions. What would be better is a non-biased method for estimation of U. Any ideas? ("I know what we need, a magic bullet!") Continuing.

Ok. Now let's take a look a the Z score for the difference between the log of production and the log of the fitted verhulst. Data are:

Code: Select all: 1901 -0.16 1902 -0.06 1903 -0.05 1904 0.28 1905 -0.33 1906 -0.91 1907 0.18 1908 0.25 1909 0.07 1910 0.26 1911 0.11 1912 -0.22 1913 -0.07 1914 -0.16 1915 -0.23 1916 -0.31 1917 -0.12 1918 -0.61 1919 -0.38 1920 0.71 1921 1.00 1922 1.34 1923 2.09 1924 1.58 1925 1.45 1926 1.15 1927 1.70 1928 1.56 1929 1.91 1930 1.02 1931 0.32 1932 -0.53 1933 -0.31 1934 -0.41 1935 -0.28 1936 -0.18 1937 0.29 1938 -0.39 1939 -0.52 1940 -0.78 1941 -1.03 1942 -1.96 1943 -1.88 1944 -1.32 1945 -1.80 1946 -1.85 1947 -1.61 1948 -1.13 1949 -1.66 1950 -1.30 1951 -0.87 1952 -0.93 1953 -0.94 1954 -1.06 1955 -0.65 1956 -0.46 1957 -0.53 1958 -0.77 1959 -0.62 1960 -0.51 1961 -0.43 1962 -0.24 1963 -0.11 1964 0.05 1965 0.56 1966 0.80 1967 0.95 1968 1.22 1969 1.43 1970 1.80 1971 1.88 1972 1.95 1973 2.26 1974 1.97 1975 1.30 1976 1.60 1977 1.59 1978 1.40 1979 1.45 1980 0.85 1981 0.20 1982 -0.31 1983 -0.61 1984 -0.66 1985 -0.87 1986 -0.66 1987 -0.78 1988 -0.63 1989 -0.66 1990 -0.61 1991 -0.73 1992 -0.76 1993 -0.81 1994 -0.75 1995 -0.68 1996 -0.52 1997 -0.31 1998 -0.17 1999 -0.29 2000 -0.01 2001 0.01 2002 0.01 2003 0.32 2004 0.68

I thought this was worth taking a special look. If you plot this up, you can see very clearly the discontinuities. The five shocks at the times mentioned earlier stand out quite clearly. We can take the Z value at the start of shock, subtract the Z value at the end of shock, and take the absolute value of this, to get a shock magnitude. Following this procedure, we get shock magnitudes of delta Z =:

2.51
2.45
2.26
0.99
2.10

Do you remember that one of the things we were going to need to proceed with the forecast was the size of the shock, when it arrived? Well, I used these numbers to approximate the size of future shocks. I know this is a very dodgy procedure with only five data elements, but hey, it's the best I could think of. Better suggestions welcome. In practice, what I actually did was a maximum likelihood estimation procedure to generate parameters for a 2 parameter Wiebull distribution. I choose Wiebull because I needed a one directional distribution, and Wiebull hit highest likelihood amoungst the bunch I tried. With five data elements, it's going to be impossible to distinguish between distributions anyway. Parameters were resolved as:

alpha 5.21
beta 2.26

So later on, I'm going to use that distribution to generate magnitudes of shocks. Check that one off.

Ok. Let's work out when shocks should occur. For those of you who are on the ball, you probably suspected from the fact that I generated a distribution to resolve shock magnitudes that I was going to use a Monte Carlo procedure later on. Well, you are right. Consistent with this, I want to a generate a probabilistic method of determining whether or not a shock occurs. I wanted to use historic Z values as the baseline assumption for what Z values trigger shocks, but have been unable to make this work yet. I am still working on variations to the model which may make this possible in the future. I any case, I simply did what MC modellers sometimes have to do, and rigged up a loose approximation to what kind of Z value will trigger a shock. I compared the current Z value with a normally distributed random variable with mean 2 and SD 1/2. A shock was triggered when the current Z value surpassed the random variable.

Let's look at what we have so far. We have the exponential trend, which we expect to continue until a shock arrives. We have a measure of the likelihood that a shock will arrive in any given time. Also, we have a tentative measure of the magnitude of that shock, were it to arrive. After the shock, we expect oil production to resume an exponential trend. However, we currently have no method of deciding what that trend should be. If you have read this far, then you may suspect that I'll use some dodgy method of approximating this, and you'd be right.

Remember the m and b parameters from the set of straight lines mapped to historical log of production? Well, if you plot these m vs b parameters, you'll see they vaguely allude to a straight line. I obtained the regression equation:

b= -61.93 * m + 2.58

Ok then. Now we have timing of the shock, magnitude of the shock, which will yield the first data point after the shock (all shocks were assumed to be 1 year long for simplicity). Also, now we can find the new exponential trend because we know the relationship between m and b.

I'm sure i've forgotten something, but, it's gone too long already, so, now we'll just take 10000 sets of the above mentioned random numbers, and obtain the forecast below. From left to right, the columns refer to -

YEAR Historic P5 P25 P50 P75 P95 Median

Code: Select all: 1950 3.80 1951 4.28 1952 4.52 1953 4.80 1954 5.02 1955 5.63 1956 6.12 1957 6.44 1958 6.61 1959 7.13 1960 7.66 1961 8.19 1962 8.89 1963 9.54 1964 10.29 1965 11.61 1966 12.62 1967 13.55 1968 14.76 1969 15.93 1970 17.54 1971 18.56 1972 19.59 1973 21.34 1974 21.40 1975 20.38 1976 22.05 1977 22.89 1978 23.12 1979 24.11 1980 22.98 1981 21.73 1982 20.91 1983 20.66 1984 21.05 1985 20.98 1986 22.07 1987 22.19 1988 23.05 1989 23.38 1990 23.90 1991 23.83 1992 24.01 1993 24.11 1994 24.50 1995 24.86 1996 25.51 1997 26.34 1998 26.86 1999 26.40 2000 27.36 2001 27.31 2002 27.17 2003 28.12 2004 29.29 2005 ---- 30.06 29.58 29.24 28.91 28.42 29.28 2006 ---- 30.98 30.19 29.64 29.09 28.30 29.73 2007 ---- 32.11 30.85 29.97 29.09 27.82 30.18 2008 ---- 33.52 31.49 30.08 28.67 26.64 30.65 2009 ---- 34.89 31.85 29.73 27.61 24.57 31.12 2010 ---- 35.44 31.48 28.73 25.97 22.01 31.60 2011 ---- 34.17 29.93 26.98 24.03 19.79 25.00 2012 ---- 31.29 27.79 25.35 22.91 19.41 24.09 2013 ---- 28.39 26.09 24.49 22.89 20.59 24.05 2014 ---- 27.25 25.57 24.41 23.24 21.56 24.24 2015 ---- 27.35 25.70 24.55 23.40 21.75 24.45 2016 ---- 27.65 25.86 24.62 23.37 21.58 24.61 2017 ---- 27.96 25.91 24.50 23.08 21.03 24.66 2018 ---- 28.13 25.77 24.12 22.48 20.12 24.56 2019 ---- 28.01 25.32 23.46 21.60 18.91 24.21 2020 ---- 27.40 24.54 22.56 20.58 17.72 23.17 2021 ---- 26.32 23.53 21.60 19.66 16.87 21.01 2022 ---- 24.92 22.42 20.69 18.95 16.45 20.08 2023 ---- 23.68 21.53 20.04 18.55 16.41 19.69 2024 ---- 22.83 20.94 19.63 18.33 16.44 19.46 2025 ---- 22.28 20.52 19.29 18.07 16.31 19.26 2026 ---- 21.97 20.20 18.97 17.74 15.97 19.05 2027 ---- 21.73 19.86 18.56 17.27 15.40 18.76 2028 ---- 21.31 19.35 17.98 16.62 14.66 18.23 2029 ---- 20.73 18.74 17.35 15.96 13.97 17.52 2030 ---- 20.02 18.06 16.70 15.33 13.37 16.60 2031 ---- 19.18 17.33 16.05 14.76 12.91 15.84 2032 ---- 18.36 16.65 15.46 14.27 12.57 15.24 2033 ---- 17.64 16.08 14.99 13.90 12.34 14.85 2034 ---- 17.08 15.60 14.57 13.54 12.06 14.51 2035 ---- 16.59 15.16 14.17 13.17 11.74 14.19 2036 ---- 16.17 14.75 13.77 12.78 11.37 13.83 2037 ---- 15.72 14.31 13.32 12.34 10.92 13.39 2038 ---- 15.27 13.85 12.87 11.89 10.48 12.91 2039 ---- 14.74 13.35 12.39 11.43 10.04 12.38 2040 ---- 14.19 12.86 11.94 11.01 9.68 11.89

If you're still with me, thanks for journeying. I look forward to hearing all the easy ways I can eliminate the incessant and inherent dodginess from the model. If someone wants me to run other numbers, I can do that too. Alternatively, PM me for the spreadsheet (excel). Be warned, however, that the spreadsheet is what I consider to be one of the most superior examples of "Spagetti spreadsheeting" that one will ever witness. A dying art in modern times. I will not be held liable for psychiatric bills incurred in trying to decypher it.

by **MicroHydro** » Tue 02 Aug 2005, 03:01:26

The thing that is hard to predict is future "hoarding" of reserves when the reality of depletion is understood. In the past, oil has been pumped as fast as it could be sold. Post peak, there will be greater motivation to hold back production (while claiming maximum production). Don't know how this could be modeled.

by **linlithgowoil** » Tue 02 Aug 2005, 07:16:28

In the past, oil has been pumped as fast as it could be sold.

Only in certain areas. Saudi certainly isnt pumping as fast as it can and i doubt it ever has, along with other middle eastern countries.

by **pup55** » Tue 02 Aug 2005, 08:42:43

Here is your discontinuity circa 1920:

Year Prod
1909 14,160
1910 20,739
1911 53,998
1912 94,662
1913 224,783
1914 247,715
1915 372,249
1916 586,202
1917 834,662
1918 382,246
1919 828,544
1920 1,038,447
1921 869,651
1922 1,384,999
1923 2,055,299
1924 1,985,661
1925 1,996,042
1926 1,629,177
1927 387,778

It's the data for Ford Model T production by year

by **pup55** » Tue 02 Aug 2005, 09:04:52

Ah, a wonderful model to find this morning. Thanks, Shiraz, for your thoughtful work.

I have one quesiton, though: Why doesn't the p-95 estimate behave like the others? Seems like this one should be below the p75. Also, there is a six year delay in the peak, it looks like.

by **khebab** » Tue 02 Aug 2005, 09:30:05

Here for the first graph:

Figure 1

by **khebab** » Tue 02 Aug 2005, 10:01:17

Figure 3

by **Shiraz** » Tue 02 Aug 2005, 10:13:26

Firstly, thanks Khebab for your chart. Plenty of info, very clear. Your effort is appreciated.

Pup55, I believe you are refering to the curve of the median, which is the far right column of data. The median behaves far less stably than the other curves, because the discontinuities, when they occur, are somewhat severe. This is not a consequence of the choice of U, it should be noted. This is a consequence of the fact that according to this model, the exponential growth curve will butt up against a concave verhulst curve. This means that the most strenuous shock will be the first shock after the year T1/2 for the comparison curve. This feature is indepentant of the chosen U, or of the set of parameters in general. Peak oil will always be the point of most turbulance when the exponential butts up against the concave verhulst.

Regarding the model T data. I'm sure you are onto something. Good spot. I wonder still, what was the cause of the wealth that allowed such rapid profligation of the motor vehicle. Something to do with the war? Or is it just that the motor vehicle is just a remarkably good product, with plenty of utility and little chance for substitution? (where have we heard that mantra before???)

PS Pup55, compare the mean curve with the curve present in your recent discussion about post peak countries. I can think of no reason why their similarity should not be mere coincidence, but is that similarity not striking none-the-less?

PPS With regard to the last comment, scrap that. I just actually plotted the "experianced based" decline vs the mean "post-decline" of this model. Although they both drop off heavy and recover to a plateau, this similarity appears to be quite superficial, as the scale of the comparitive effects is quite different.

by **khebab** » Tue 02 Aug 2005, 10:27:03

Shiraz, good job! I think a local approach where the exponential fit is stable between discontinuities is the way to go. Global approach implicitely assumes that the underlying random variables are stationnary which is obvsiously not the case.

I will post more cooments later. I have to go now.

by **pup55** » Tue 02 Aug 2005, 14:17:55

Pup55, I believe you are refering to the curve of the median, which is the far right column of data. The median behaves far less stably than the other curves, because the discontinuities, when they occur, are somewhat severe.

I see that now. Thanks for the clarification.

PS Pup55, compare the mean curve with the curve present in your recent discussion about post peak countries. I can think of no reason why their similarity should not be mere coincidence, but is that similarity not striking none-the-less?

It is interesting, though.

by **EnviroEngr** » Tue 02 Aug 2005, 15:05:53

ALL:

Very nice. Thanks.

by **khebab** » Tue 02 Aug 2005, 16:20:08

This is the last figure from your calculations:

Shiraz wrote:Either way, it seems strange to me that WWI produces no obvious negative shock, and that the aftermath should produce such a positive shock, but I don't know my history that well. Oil production certainly seems to behave differently between the two world wars. Maybe someone knows why.

The US production during WWII was around 1.5 Gb/year which was almost half the world production. Therefore, the world production was shielded from the effect of WWII.

Z =:

2.51
2.45
2.26
0.99
2.10

Can you give more details about this estimates? in particular which years did you use?

I wonder if a local exponential fit on a local sliding window and the analysis of the residuals could be a better indicator of a shock instead of the residuals from a global model (Verhulst).

by **Shiraz** » Tue 02 Aug 2005, 19:53:27

Can you give more details about this estimates? in particular which years did you use?

Here's a lift from the spreadsheet. As you can see, for the first shock I used 1919 to 1923. The data I posted for the linear fits actually came from an old version of the model, which is why it shows the linear fitting around the dates 1919-1920. It wasn't until I saw the Z values that I came to believe that 1923 might be better recognized as the end of the 'shock'. I don't imagine the difference between these cases will have too large an effect. It is very unfortunate that there are so few shocks to work with. In a perfect world we might be tempted to exclude the 1919 shock from analysis completely, as the implicit assumption being made here is that positive shocks and negative shocks are equivalent but for sign.

Code: Select all: Shock Y Begin Y end Z begin Zend |Ze - Zb| 1 1919 1923 -0.40 2.11 2.51 2 1929 1932 1.95 -0.50 2.45 3 1937 1942 0.35 -1.91 2.26 4 1973 1975 2.21 1.22 0.99 5 1979 1983 1.33 -0.77 2.10

There appear to be minor variation between these Z values and those on your chart. My guess is that this is due to rounding? Perhaps 2 dec. places didn't cut for log data.

I wonder if a local exponential fit on a local sliding window and the analysis of the residuals could be a better indicator of a shock instead of the residuals from a global model (Verhulst).

To check what you mean, you refer to exponential smoothing over a local window (eg prev 10 years) and then calling a shock when smoothed data at T minus smoothed data at (T-1) is greater than some threshold?

Ok, this is really interesting, and it would seem preferable to use this over the data region. But I wonder how this would be extended into the forecast region. To be frank, i'd really rather remove the verhulst too. It doesn't sit right with me. But as I see it, the verhulst is the only part of the model that ensures future production will return to zero. If you remove the verhulst, you need some adjustment to keep oil production finite.

Maybe the local exponential smoothing technique could provide an objective way to identify shocks (as opposed to my eyeballs). This would be good in and of itself.

An idea which I have yet to implement is to have the shocks shift the verhulst locally as well. I'm not entirely sure how this should be done. In essence, however, all the verhulst does is provide a moving threshold according to a global formula. If the verhulst was somehow 'adjusted' at shocks, then it would still progress according to a verhulst formula, however the meaning of the exact threshold will be local. Is that clear?

I look forward to further ideas.

by **EnergySpin** » Tue 02 Aug 2005, 20:03:05

Shiraz, you can try a local smoothing technique (like LOWESS). Even the verhulst fit became better behaved. I suggest lowess with control parameter f=0.15 (15% of tall data points). From the view point of regression taking a log is a well known variance stabilizing transformation so there is no reason why one could not do a Log-Verhulst fit (just an idea). I dislike the idea of having the URR a fixed parameter. In reality no one knows the URR and one could create any picture one wants by fixing the URR. Having said that the EIA stupidities are due to choosing a mathematically (but not plausible) URR / Peak date to delegate the problem to an administration 8 elections away :-D

Good one though

by **Shiraz** » Tue 02 Aug 2005, 22:22:03

I wrote:There appear to be minor variation between these Z values and those on your chart. My guess is that this is due to rounding? Perhaps 2 dec. places didn't cut for log data.

Sorry, this is incorrect. I checked my sheet and the Z values weren't linking properly, meaning I was using the Z values from the original fit (U unconstrained). Good spot, Khebab.

The Z values should read:

Code: Select all: Shock Y Begin Y end Z begin Zend |Ze - Zb| 1 1919 1923 -0.38 2.09 2.47 2 1929 1932 1.91 -0.53 2.44 3 1937 1942 0.29 -1.96 2.25 4 1973 1975 2.26 1.30 0.96 5 1979 1983 1.45 -0.61 2.06

The new parameters for the Wiebull become:
alpha 5.10
beta 2.23

The effect of changes on this magnitude is minute, and within the confidence limits for the MC. For example, the mean prediction for 2005 to 2010 before and after the changes were:

Code: Select all: Before After 2005 29.24 29.25 2006 29.64 29.65 2007 29.97 29.95 2008 30.08 30.07 2009 29.73 29.74 2010 28.73 28.69

For completeness, however, I have run a new MC (10000 trials) and include the results below (forecast only)

Code: Select all: Year P5 P25 P50 P75 P95 Median 2005 29.96 29.54 29.25 28.96 28.53 29.28 2006 30.89 30.16 29.65 29.14 28.41 29.73 2007 32.16 30.86 29.95 29.05 27.75 30.18 2008 33.53 31.49 30.07 28.65 26.61 30.65 2009 34.86 31.84 29.74 27.64 24.62 31.12 2010 35.35 31.42 28.69 25.96 22.03 31.60 2011 34.16 29.96 27.05 24.14 19.94 25.16 2012 31.37 27.88 25.45 23.03 19.54 24.21 2013 28.51 26.20 24.59 22.98 20.67 24.15 2014 27.41 25.70 24.51 23.32 21.61 24.34 2015 27.44 25.78 24.62 23.47 21.81 24.57 2016 27.76 25.94 24.69 23.43 21.62 24.73 2017 28.05 25.98 24.54 23.11 21.04 24.77 2018 28.22 25.82 24.15 22.48 20.08 24.63 2019 27.99 25.31 23.44 21.57 18.88 24.19 2020 27.38 24.53 22.55 20.58 17.73 23.00 2021 26.24 23.50 21.59 19.69 16.94 20.98 2022 24.89 22.44 20.74 19.03 16.58 20.24 2023 23.67 21.57 20.11 18.64 16.54 19.81 2024 22.87 21.00 19.69 18.39 16.51 19.57 2025 22.41 20.63 19.39 18.15 16.37 19.39 2026 22.11 20.29 19.02 17.76 15.93 19.14 2027 21.78 19.88 18.57 17.25 15.35 18.76 2028 21.35 19.38 18.00 16.63 14.65 18.24 2029 20.73 18.74 17.36 15.98 14.00 17.46 2030 20.00 18.06 16.71 15.36 13.42 16.59 2031 19.16 17.35 16.08 14.82 13.01 15.86 2032 18.42 16.74 15.56 14.39 12.71 15.37 2033 17.75 16.18 15.09 14.00 12.44 15.00 2034 17.17 15.69 14.66 13.63 12.14 14.64 2035 16.70 15.25 14.24 13.24 11.79 14.27 2036 16.26 14.82 13.82 12.82 11.38 13.89 2037 15.81 14.37 13.37 12.37 10.93 13.45 2038 15.31 13.89 12.90 11.91 10.49 12.93 2039 14.77 13.39 12.43 11.47 10.08 12.40 2040 14.20 12.88 11.96 11.04 9.71 11.88

by **khebab** » Tue 02 Aug 2005, 22:44:01

Shiraz wrote:There appear to be minor variation between these Z values and those on your chart. My guess is that this is due to rounding? Perhaps 2 dec. places didn't cut for log data.

I don't know why the z-score values are different. I calculated the residuals myself using the parameters you gave for the Verhulst model. Maybe, the way I compute the Z-score is different.

Shiraz wrote:To check what you mean, you refer to exponential smoothing over a local window (eg prev 10 years) and then calling a shock when smoothed data at T minus smoothed data at (T-1) is greater than some threshold?

or you could test the local validity of the fit using a statistical test. A threshold on the confidence could give you the shock position.

Shiraz wrote:To be frank, i'd really rather remove the verhulst too. It doesn't sit right with me. But as I see it, the verhulst is the only part of the model that ensures future production will return to zero. If you remove the verhulst, you need some adjustment to keep oil production finite.

It's interesting to compare your approach with the GBM approach (see http://www.peakoil.com/fortopic9205.html). The Verhulst model seems to be equivalent to the "accessibility of imitators" function in the GBM which is deformed by the control function x(t) to model the shocks. More precisely, I you take the log of the GBM model:

Code: Select all: log(P(t))= log(qz(t)/m(m - z(t))) + log(x(t))

the first term is equivalent to the "Log-Verhulst" you used, and the second term is the shock function. Maybe, there is a way to mixed the GBM approach and yours.

by **Shiraz** » Wed 03 Aug 2005, 03:16:10

I calculated the residuals myself using the parameters you gave for the Verhulst model.

Here are the unabridged versions of the parameters. I can tell you the plot of the Z values looks remarkably similar to mine.

Qinf 2000
1/k 14.53091584
n 1.836780522
T1/2 2003.046951

The other thing is that i've heard excel has a minor flaw in the calculation of Standard Deviation. This never concerned me as I'm never concerned about latter decimal places. Maybe this error is worse than I thought.

Shiraz, you can try a local smoothing technique (like LOWESS). Even the verhulst fit became better behaved.

I wonder if you could explain a bit more. I tried the lowess model you suggest, and a bunch of others, but when I fit the verhulst, the n parameter rises without reasonable bound. Do you constrain n? What kind of verhulst fits are you getting. The best thing for me with fitting log verhulst to log data is that the early data is no longer irrelevant to the fit, and this acts as an effective constraint upon n. Otherwise, correct height at circa yr 2000 completely overwhelms good shape at circa 1900-1950.

by **EnergySpin** » Wed 03 Aug 2005, 03:46:02

The procedure involves:
1) smoothing the data
2) Fitting using an arbitrary precision numerical algorithm (Levenberg Marquadt, inside Mathetaitca - I do not trust Excel)
3) Using the original set of data ... and constraining the number of optimization steps I generate a family of curves and look not at the Mean curves but at the the curves in the 1% -99% (parameter estimation using the Verhulst leaves a significant bias in the final estimates
4) My results (very iteration dependent ... the estimation procedure is as you know numerically unstable for this model)
Smoothed Data (this are the means though; I would like to use a Monte Carlo approach sampling from the joined probability distribution of Qinf, τ , τ1/2, n to see if I can generate a family of possible curves; needs Mathematica programming and have no time for the next 20ds)
Iter Qinf τ τ1/2 n
18 1736.2 21.5 93.06 0.2
19 2019.7 20.3 98.6 0.718
20 2303.4 19.6 103.83 1.26
21 2870.8 18.6 114.58 2.35
Looking at the curvature measures of nonlinearity, the middle two solutions are the ones most likely to be correct. When I finish a professional examination nuisance I will have more time to try a hierarchical Model looking at both discoveries and consumption. For the Verhulst my numbers is as above though

by **EnergySpin** » Wed 03 Aug 2005, 03:58:37

The other thing is that i've heard excel has a minor flaw in the calculation of Standard Deviation. This never concerned me as I'm never concerned about latter decimal places. Maybe this error is worse than I thought.

Excel sucks ... there was an article a couple of years on statistical packages and add ons. Only Mathematica survived ... even Statistica was found to have errors. Most of the errors were due to instabilities of the series approximations or asymptotic formulas used in the special functions. Mathematica somehow is immune due to the arbitrary precision arithmetic (so it can choose to increase precision when the number makes no sense) and the fact that they use the "correct" analytic continuations. E.g. they compute the P values of the Student t distribution using either a 2F1 or an incomplete Beta function based on the parameters.

by **khebab** » Wed 03 Aug 2005, 11:15:43

I replaced the Verhulst model by the GBM model with a constant control model x(t)=1. The GBM model is composed of three functions:

Code: Select all: z_prime(t)= (innovators(t) + accessibility_of_imitators(t)) * x(t);

The corresponding Z-score are shown in green. I computed also a pseudo Z-score based on the log of the control function composed of three shocks proposed by Guseo et al. (see thread Peak prediction based on the Riccati equation: 2007! for more details). The parameters for the GBM are the following:

Code: Select all: URR= 1987.24 p= 0.000104 q= 0.061426 Control model Shock 1: c1= -0.303159 b1= 0.056418 (17.724721) a1= 1980.871537 Shock 2: c2= 0.071257 b2= 0.072288 (13.833636) a2= 1952.157677 Shock 3: c3= -0.228596 b3= 0.070196 (14.245778) a3= 1973.317965

We can notice that the Z-score of the control function matches closely the shock free Z-score for the period 1952-2000. Guseo did not put any shocks previous to 1951 probably because the oil infrastructure was not mature enough. After 2004, the control model goes exponential which enables to push production. Following your approach there is maybe a way to design a probabilistic GBM based on a MC simulation of the future shocks instead of assuming a determistic model like Guseo's model.

Verhulst, GBM, et al - Model Runs

Verhulst, GBM, et al - Model Runs

Well Done

Who is online