The Data Scientist

# Confidence intervals for ICOs and cryptocurrency prices

Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !

## Probability distributions for ICO pricing

Valuing ICOs is not an easy task, given that there is no standard methodology yet. I have presented such a methodology in another post, which is still work-in-progress. However, as I have mentioned in the past, tokenomics is a very important part of any ICO.

In this post I am taking a data-driven approach to calculating the likelihood of a token’s price.

TokenData provides data on the return of various ICOs. I used their data in order to fit a probability distribution over returns for various ICOs. The methodology was as follows:

1. Find a reasonable time period over which the fit seems to be good.
2. Find the right distribution based on the log-likelihood or derived metrics (e.g. AIC).
3. Use the Kolmogorov-Smirnov test and graphical means to test for the fit.

The total dataset of a 100 ICOs. The variable modelled was the return over the original price. Here is a plot of that:

The data covered a period of a few months up to 4 years. In the world of startups and ICOs 4 years can be a very long time. Businesses that survive that long will likely have high returns. Furthermore, many investors might not be interested in timespans that are that long. Therefore, it was decided that it would be good to also try some cut-offs (this comes back to point 1 discussed above).

An initial estimate of fitting many different distributions over the whole dataset is a bit disappointing. The most likely candidates include: ErlangNakagamiexponentiated Weibull, Exponentiated Power, Chi-square and others. However, none of these succeed in a Kolmogorov-Smirnov test (p-value>0.1 in all cases).

Choosing a cut-of at 2 years (leaving us with 34 datapoints), however, provides interesting results. In this case, the following distributions seem to provide a good fit (p-value<0.05): Nakagami, Gauss hypergeometric, and exponentiated Weibull. A comparison between real data and the fitted distributions is shown below.

## Fitting distributions on ICO returns: conclusions and lessons learned

So, what can we learn from that? First of all, it is satisfying to learn that there are distributions that seem to fit the overall ICO returns, even if no fit is perfect. The nature of the distributions themselves is interesting.  None of these distributions are very popular, and as far as I am aware, they have never been used in that context. This requires us to come up with a theory as to why they seem to fit the data well.

Based on that model we can calculate pseudo-confidence intervals of the forecasted price of a token’s value. The figure below shows the fictional example of a token price forecast and associated pseudo-confidence intervals.

Two things to note here:

1. The intervals are 55%, instead of the traditional 95%. The reason is that due to the shape of the distribution and the data, 95% confidence intervals tend to be extremely wide. This is not conventional, but still presents the investor with the most likely scenario.
2. I am using the term “pseudo-confidence intervals” because these are not confidence interval per se. This model does not take time into account. A proper model would be a forecasting model, that is using the Nakagami (or some other distribution) as its noise component.

In spite of its shortcomings, the model is a step in the right direction. If the returns over time converge to a specific value, then a static probabilistic model could actually be used as it is in order to derive probabilities of different pricing outcomes. However, it is more likely that we will need data sampled from different points in time in order to create an improved model.

In any case, this is only the beginning of more interesting models to come.

Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !