Building a general purpose performance-based screen

24th Jul '18 by Nick Ray

16 comments 4362 reads

THE IDEA

Suppose we want to build a screener using a set of rules to create a portfolio which meets some criteria. (For example: good performance over the next year.)

These rules look like:

metric > value

metric < value.

So they divide the stocks into two sets: the stocks we keep and the stocks we reject - hopefully in a way which meets our defined goals.

In statistics this is called a classification problem and the value is called the 'cutoff'. Are there any guiding principles we can use to choose these cutoff values?

A portfolio usually turns out to contain about 10% super performers, 10% disasters and 80% not doing very much in the middle. So if you can change the ratio of super:disaster, and in particular avoid disasters, you can tip the portfolio into better performance. The mush of middling performance in the middle will have quite a low effect on the overall portfolio performance so we don't worry too much which of these stocks get included or rejected by the rules.

It turns out that this mush in the middle is quite handy. If we try to maximise the ratio super:disaster for each rule individually we introduce a lot of data mining errors, because the best ratio is likely to result when 'super' and 'disaster' are both quite small numbers - which encourages very high cutoffs. And the exact value of the cutoff is very sensitive to exactly what happened in the period we are using to test out theory. This is a classic example of over-fitting to historical data, and will almost always give poor results.

So instead of doing that, let's try to maximise the value (super - disaster) for each metric instead. This way we pass as many stocks through each rule as possible, subject to trying to keep a big excess of super stocks versus disaster stocks.

As a consequence, we will need more rules to reduce down the number of stocks that pass to a sensible portfolio size, because each rule prunes away relatively fewer extra stocks.

Originally I calculated cutoffs for the metrics that I've kept data for. For example, here is a plot of an analysis for the "ROE %" metric.

The minimum of the function (shown by the dotted blue line) locates the…

Unlock with Email

Unlock with Google

Already have an account?
Login here

andrewdb

25th Jul '18

I remember your article on value rank showing that a very high score by itself did not correlate with future outperformance.
I.e. if a stock is v.v.v. cheap, that is usually for a reason.

There are studies rhat show momentum effects tend to persist, but, i would have thought in the same way as value, very high momentum ( say momentum rank over 90) is there not a mean reversion effect over a year?
I.e. at some point icarus loses his wings.

The same - i thought - would apply to quality.

I.e. screens benefit by excluding stocks that are too optimised to one criteria?

Nick Ray

This is the probability delta plot for Momentum Rank. It shows no sign of slowing down at higher values - if anything the slope is getting steeper. (Note how close the minimum is to the median again.) It is worth mentioning though that I do not include stocks with a Mkt Cap much below £90M or with wide spreads, so the worst of the "Momentum Trap" type stocks are not included in the data. This might be why my market median is at 65 rather than 50 as you would expect.

Clearly very high momentum cannot go on forever. However it is very hard to see on the 1-year time-frame. If it is there at all it does not kick in at 90 but probably somewhere between 98-99. This is right in the tail of the distribution where it is very noisy.

With the approach in this article though, instead of looking for super performers in just one or two metrics (e.g. very high Q and M) we look for better-than-average across as many metrics as we can. So it would be no good having M>95 but a negative Op Mgn for example.

iwright7

Nick,

Any chance please of a similar graph for QM scores. Thanks, Ian

Not quite sure what you mean by QM scores. Here's the same thing for Quality Rank:

Nick - Many thanks - I mean QM Ranking, if it is possible to graph it? Ian

Ah OK. I don't collect that one so I can't produce a graph for it, although I suspect it is not going to be too different from the Q and M graphs.

Howard Adams

Hi Nick

Excellent post and screener. Many thanks.

I subscribe to UK, Europe including UK and US. And, I get Focus All Regions.

Other readers might like to note if you run Nick's screen for Focus All Regions you will get X hits (1 when I ran it).

But if you run it for Focus US (2 hits), Focus UK (o hits) then Focus Europe (1 hit). Thus, running at region level if you have several, you will get additional hits not present in the Focus All Regions run.

Regards

Howard

DWit199

Nick, could you explain in a little more detail how you calculate the probability delta plots and what the axes mean?

Very briefly, it is the difference between two ECDF (empirical cumulative distribution function) plots, one for high-performing stocks and one for low-performing stocks. The nice thing about ECDFs is that they capture something about the whole distribution, whereas working with just the mean and variance alone will make some inaccurate simplifications.

The genesis of the idea was born in the discussion on this article which looked at the distribution of returns depending on the value of the Stock Rank. Eventually I became interested in the way that these distribution functions cross over when you might expect that the high stock rank curve would remain better than the low stock rank curve over the whole length. (see my comment #26 in the thread.)

Anyway, eventually I reversed the question and asked what was the distribution for a given metric on the x-axis when looking at the top and bottom deciles of outcomes. The plots in this thread are difference plots subtracting a high percentile outcome from a low percentile outcome. The minima of these plots are the points where cumulatively going forwards from that value there is the greatest excess of "good" stocks over "bad" stocks (as defined by whatever outcome you used.) Hence why they work as a cutoff. Similarly the maxima work in the opposite direction so a maximum on the function would indicate a point with a cutoff you should be below rather than above.

The main - maybe only - reason any of this turned out to be interesting was that after doing the donkey work the solution was so indicative of something a bit more interesting. The symmetry in the functions results in the median occupying a special place and allowing us to generalise rather than compute the solution for future metrics. Instead of a bunch of "magic numbers" we have the beginnings of an understanding of what is happening.

The main caveat is that (as with all things mathematical) the conclusion is only true so far as you are happy accept the conditions which get you to the conclusion. So if you want to filter by maximising the difference, this method works and it is stable. But you could argue quite effectively that you should be optimizing something else entirely - in which case none of this helps at all!

26th Jul '18

Sorry Nick, I'm probably being a bit dim but I still don't fully understand the charts. Could you walk through an example for me? If I could recreate the chart I would have a better understanding of you message.
I have a spreadsheet with data on 300 shares, Names in column A, ROCE at the beginning of the year in column B, Total Return for the following 12 months in column C. How do I get to one of your charts from that data?

That is a bit of a tall order. I'll try to see whether you can approximate this in a spreadsheet but it's going to be heavy going. I think the nearest thing to an ECDF is the RANK function but it then gets very messy because I need to filter by various conditions and then align the two ECDFs that are created, and I'm not sure if spreadsheet functions are up to it.

OK. Here's a spreadsheet version. I've used OpenOffice so you might need to adjust to suit the syntax of your preferred spreadsheet. It is a bit hairy!

So here is a sheet with data in the format you described (Ticker, ROCE, yearly return):

You can include the same stock more than once if you have multiple years of data.

You need to sort this sheet on column C as in the example above with lowest values first.

Now in a second sheet you want these formulae: (right click - view as image to enlarge)

The rows from 17 to 36 are all identical - you don't need to type them in individually.

Set up the value in C1 so that it points at your data in 'Sheet1'.

You should get a sheet which looks a bit like this:

And if you plot the "metric" column against the "delta" column you get a plot similar to the ones I presented, hopefully with a minimum near the median value:

In case anyone is struggling to read the formulae or would like to cut-and-paste here is table version of them. Although it looks a bit ugly here, it should cut and paste as a table into a spreadsheet with a bit of luck!

	data	="Sheet1.$b$2:Sheet1.$b$28069"

	minval	=PERCENTILE(INDIRECT($C$1);0.025)
	maxval	=PERCENTILE(INDIRECT($C$1);0.975)
	median	=MEDIAN(INDIRECT($C$1))

	goodstart	=(2/3)*COUNT(INDIRECT($C$1))
	goodend	=COUNT(INDIRECT($C$1))+1
	badstart	=2
	badend	=(1/3)*COUNT(INDIRECT($C$1))

	good	=ADDRESS('goodstart';2;1;;"Sheet1") & ":" & ADDRESS('goodend';2;1;;"Sheet1")
	bad	=ADDRESS('badstart';2;1;;"Sheet1") & ":" & ADDRESS('badend';2;1;;"Sheet1")

step	metric	goodecdf	badecdf	delta
0	=$C$3+$A16*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B16)	=PERCENTRANK(INDIRECT($C$13);$B16)	=C16-D16
=A16+(1/20)	=$C$3+$A17*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B17)	=PERCENTRANK(INDIRECT($C$13);$B17)	=C17-D17
=A17+(1/20)	=$C$3+$A18*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B18)	=PERCENTRANK(INDIRECT($C$13);$B18)	=C18-D18
=A18+(1/20)	=$C$3+$A19*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B19)	=PERCENTRANK(INDIRECT($C$13);$B19)	=C19-D19
=A19+(1/20)	=$C$3+$A20*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B20)	=PERCENTRANK(INDIRECT($C$13);$B20)	=C20-D20
=A20+(1/20)	=$C$3+$A21*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B21)	=PERCENTRANK(INDIRECT($C$13);$B21)	=C21-D21
=A21+(1/20)	=$C$3+$A22*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B22)	=PERCENTRANK(INDIRECT($C$13);$B22)	=C22-D22
=A22+(1/20)	=$C$3+$A23*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B23)	=PERCENTRANK(INDIRECT($C$13);$B23)	=C23-D23
=A23+(1/20)	=$C$3+$A24*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B24)	=PERCENTRANK(INDIRECT($C$13);$B24)	=C24-D24
=A24+(1/20)	=$C$3+$A25*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B25)	=PERCENTRANK(INDIRECT($C$13);$B25)	=C25-D25
=A25+(1/20)	=$C$3+$A26*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B26)	=PERCENTRANK(INDIRECT($C$13);$B26)	=C26-D26
=A26+(1/20)	=$C$3+$A27*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B27)	=PERCENTRANK(INDIRECT($C$13);$B27)	=C27-D27
=A27+(1/20)	=$C$3+$A28*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B28)	=PERCENTRANK(INDIRECT($C$13);$B28)	=C28-D28
=A28+(1/20)	=$C$3+$A29*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B29)	=PERCENTRANK(INDIRECT($C$13);$B29)	=C29-D29
=A29+(1/20)	=$C$3+$A30*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B30)	=PERCENTRANK(INDIRECT($C$13);$B30)	=C30-D30
=A30+(1/20)	=$C$3+$A31*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B31)	=PERCENTRANK(INDIRECT($C$13);$B31)	=C31-D31
=A31+(1/20)	=$C$3+$A32*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B32)	=PERCENTRANK(INDIRECT($C$13);$B32)	=C32-D32
=A32+(1/20)	=$C$3+$A33*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B33)	=PERCENTRANK(INDIRECT($C$13);$B33)	=C33-D33
=A33+(1/20)	=$C$3+$A34*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B34)	=PERCENTRANK(INDIRECT($C$13);$B34)	=C34-D34
=A34+(1/20)	=$C$3+$A35*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B35)	=PERCENTRANK(INDIRECT($C$13);$B35)	=C35-D35
=A35+(1/20)	=$C$3+$A36*($C$4-$C$3)	=PERCENTRANK(INDIRECT($C$12);$B36)	=PERCENTRANK(INDIRECT($C$13);$B36)	=C36-D36

Thanks Nick. I will work through those formulae and see if I can understand it.

27th Jul '18

... so are you taking these individual rules and then optimising a combination of them for selecting a set of ~N stocks that have the best performance over the last 12 months

... and then seeing what actually happens over the next 12 months

?

What the spreadsheet here does is to separate a year's performance into three parts: good (the top third), bad (the bottom third) and the third in the middle we ignore. (You could use the top and bottom decile instead (but that won't work well with only 300 data points though.) In practice it does not matter too much how you define the good and bad portions.)

So for each of good and bad we look at some metric at the beginning of the period (in this case ROCE) and see how likely it is that stocks with each value of that metric fall into the good or bad group. By using the whole distribution and comparing the shape of the good and the bad version we can see at what value of the metric we get the best separation of good and bad. That's all there is to it. If you don't use metrics which provide some measure of performance you will not get sensible results though.

As far as optimising a combination is concerned, one of the aims here was to try to choose cutoffs so that you can combine rules easily without having to explicitly optimise a combination together. So although I stress that you need a reasonable number of rules, they are each optimised independently.

There is a bit of a philosophy here that not everyone will buy into. But to me it seems there is a "club" of decent stocks that keep turning up no matter what screen you use. And a lot of them drop out of screens and then pop back in again six or twelve months later. So it is like a fuzzy set of stocks. This club is actually quite big - maybe 100 stocks or more. Once you've basically found the club, to some extent it does not matter too much which subset you choose to put in your portfolio. The NAPS system also basically picks from the same fuzzy set and deliberately tries to impose sector diversification on its choices. Using "rank in sector" instead of "rank in market" acts in a similar way here.

As for seeing what happens over the next 12 months - well in the sense that we are trying to screen for stocks with good performance going forwards, yes!

Building a general purpose performance-based screen

THE IDEA

Unlock the rest of this article with a 14 day trial

About the Author

Nick Ray

16 comments

Unlock the metrics for all these stocks and more...

Unlock the metrics for all these stocks and more...