You’ve probably noticed irregular and hefty spikes of traffic in your Google Analytics data. If you do some investigating, you may find that a lot of it comes up as referral traffic from legitimate-sounding websites– like darodar, semalt, or buttons-for-websites. What you may not know is that much of this traffic is actually spambots– programs located on another server that send out data. And here’s the kicker: these bots never actually visit your site. They do send Google’s server the signal that they visited the site; but they never even get near your server.
Before I continue any further, I should note two things. First, never go to the sites that the spam comes from. They are malicious, and they will try to infect your computer. Second, what we’re going to do is define a filter for a Google Analytics view. This new view should be in addition to the default, unfiltered view. Don’t delete the unfiltered one– it can still be useful.
The primary tool for getting rid of the spam data when you look at Google Analytics are View Filters. Filters are primary because they’re easy to implement and do a pretty good job.
To create a new filter, do the following:
- Open your Google Analytics property, and go to the ‘Admin’ tab up top. Once you click that, you should see three columns. The one we want is on the right: View.
- Click the View dropdown. You will see a searchable list of available views.
- Click ‘Create New View’.
You should now be looking at the new view creation screen.
- Name your view (something general like “Filtered” or something more specific as you see fit), select your timezone.
- Click ‘Create View’. This will create the view and send you back to the Admin panel.
- Look to the right-hand column to ensure that the new view is selected in the dropdown.
The first addition to this view is the simplest to implement.
- Open ‘View Settings’.
- All we’ll be doing here is checking a box near the bottom: ‘Exclude all hits from known bots and spiders’.
- Once that’s done, click ‘Save’.
- In the left-hand menu, click ‘Filters’.
This will take you to the Filters screen, where you can manage filters and create new ones. Click the big red ‘New Filter’ button to create a new filter, which will take us to the Add Filter to View screen.
The first filter we’ll set up will be the easiest: it will block Semalt crawl hits from showing up in this view. Semalt is a legit site with a webcrawler, much like Google’s. However, unlike Google’s crawler, Semalt’s leaves page hits in its wake, throwing off your Analytics data.
- Since we’re making a filter to hide Semalt traffic, we can name it simply ‘Hide Semalt’.
- Leave the Filter Type as ‘Predefined’.
- Under ‘Select filter type’, choose ‘Exclude’.
- Under ‘Select source or destination’, choose ‘Traffic from the ISP domain’.
- Under ‘Select expression’, select ‘That contain’.
- Enter ‘semalt.com’ into the input field.
- Finally, click the ‘Save’ button to wrap it all up.
Now that we’ve created a simple Filter, we can move on to the more complex one. This one will hide data from several domains that send out spam signals. Unfortunately, it has to be kept up manually. This puts the pressure on us, adding yet another thing to do for the sake of clean data. Fortunately, once you’ve done it once, modifying the Filter is simple and quick.
- Get back to the Add New Filter to View screen via the Filters > New Filter selections.
- Give this anti-spam filter a name.
- Change the type to ‘Custom’. A new set of options will appear.
- Leave ‘Exclude’ marked, and in the drop-down next to it, scroll down and select ‘Campaign Source’.
Putting things into the ‘Filter Pattern’ field is where it gets a little wonky. Essentially all we are putting into this field is a list of domains to ignore (much like semalt.com in the previous filter). But because of the type of data this input will be modifying, it must be formatted in a special way– somewhat humorously called “Regular Expression”. That link will take you to Google’s notes on regular expression.
The simplest rules to remember are these:
- Put a \ before a period. So a domain will look like “example\.com”.
- Put a vertical bar between items in your list. In this case, between different domains you want hidden from the view. For example: “example\.com|internet\.net”. In case you’re having trouble finding it, the vertical bar is the character above the backslash on your keyboard.
A good starting list is this one, provided by Georgi Georgiev from Analytics-Toolkit.com:
darodar\.|semalt\.|buttons-for-website|blackhatworth|ilovevitaly|prodvigator|cenokos\.|ranksonic\.|adcash\.|simple-share-buttons\.|social-buttons\.
- Copy and paste this into the ‘Filter Pattern’ field (and then save the Filter),
This will hide all traffic from those domains so long as you are in this view. And if (more like when) you notice a new domain sending you spam traffic, you can add a new domain to the list. Just be sure to follow the formatting rules.
One final note: these filters are not retroactive, so any data from before the view is created will still be unfiltered. There is a way to clean up data from the past using Advanced Segments, but that’s a whole different topic. This article by Ben Travis from viget.com has a good overview of different tactics for getting this spam out of your data. While the effectiveness of some tactics is debatable, Advanced Segments is universally endorsed as effective.
Additional tactics exist, but we are definitely on the back foot in this fight. Until Google gives us some help, or the spammers move on to new tactics, we have to remain very vigilant.