To properly discuss statistical experiments for Internet marketing it is important to begin with a brief overview of what exactly an experiment is. For our purposes, an experiment is when we take websites and randomly assign changes to them and look for changes in results. Any detected changes can then be said to be caused by the changes we made. In the real world, experiments tend to include variables about people which prevent researches from assigning individuals to groups. For instance, if a big drug company is testing the effect of a medicine, they could take 50 people and select 25 to receive the new medicine and 25 to get nothing for their headaches (the “control group”). They would then randomly select each person to go into a group. Then if the people in the group getting the medicine have fewer headaches the researches would could conclude that the medicine is in fact the cause of the change in headaches. In similar fashion, that is what we do at FRUITION to determine how changes on a website are viewed by the search engines (including Facebook Graph Search).
Practically, many SEO firms already a have a pretty good understanding for what works and what doesn’t work for a website based on their experience. Experience is a great thing; however we know that Google is constantly changing little pieces of their algorithm. In order to be effective in SEO it is critical to be able to incorporate these changes quickly.
For the fastest SEO results we look at correlations, as we have discussed over the last several blog posts. We need merely download a dataset and look at the relationships among the variables. To get causation requires us to take another step which is a bit tricky. To determine the variables that actually cause an improvement in a website’s rank we would need to conduct a study of our own. We would begin by selecting a number of websites we can control. Then we divide the websites into two groups. The first group does not get changed, at all. The second group has each variable we suspect is important changed in a substantial way. Then we check to see if there is a more significant change in the modified sites than in the untouched sites. For each variable we rest the modified sites and do it all over again. Any changes which occur this way are in fact due to causal relationships. In this way we can readily identify the variables that Google is actually using in their algorithm and the corresponding lag that goes into effect between changes and Google’s index updating.
The statistical test we use in this case is called Anova. Consider the table above. The model row refers to the changed sets, while the error is the unchanged sets. Sums of Squares is closely related to variances. We then compare the variances of the changed sites to that of the unchanged sites. We then divide the Variance of the model by the variance of the error and get an “Fvalue” this Fvalue is compared to a table of known values and results in a pvalue. In this case the Pvalue = .0283 and means that there is only a 2.83% chance that the model had no effect. In our case this would tell us there was a 97.17% chance that the variable caused changes in rank.
The mathematics involved in Anova are not horribly complex, but with large samples or when needing multiple tests done, programs such as R or SAS crunch the actual calculations and consider only the P values.
President & Founder, Tru Family Dental
Marketing, Dependable Cleaners
President, Frame Destination
President & Founder, Family Travel Association