Please visit Search Engine Land for the full article.
Please visit Search Engine Land for the full article.
Google Analytics (GA) has done more than any other platform to bring the practice of data analytics to the center of organizations.
By offering a free-to-use, intuitive solution to businesses of any size, it has offered the promise of full transparency into customer behavior.
Moreover, as part of the broader marketing analytics movement, it has helped shape the language we use daily. Our handy guide explains some of the most frequently heard, but at time confusing, terms GA has brought into everyday parlance in the marketing world.
Pitch decks and strategy sessions abound with references to “data-driven decisions” nowadays, which is a healthy trend for businesses overall. Beyond the buzzword status this phrase has attained, it is true that businesses that integrate analytics into the decision-making process simply get better results.
Google reports that business leaders are more than twice as likely to act on insights taken from analytics:
As Google continues to improve its offering, with Optimize and Data Studio available to everyone, and an ever more impressive list of paid products via the Analytics 360 suite, marketers need to understand the data in front of them.
Unfortunately, there are some common misunderstandings of how Google collects, configures, processes, and reports data.
The below are some of the commonly misunderstood metrics and features within the core Google Analytics interface.
By avoiding these pitfalls, you will enable better decisions based on data you can trust.
What is it?
Bounce rate is a simple, useful metric that is triggered when a user has a single-page session on a website. That is to say, they entered on one URL and left the site from the same URL, without interacting with that page or visiting any others on the site.
It is calculated as a percentage, by dividing the aggregate number of single-page sessions by the total number of entries to that page. Bounce rate can also be shown on a site-wide level to give an overview of how well content is performing.
As such, it makes for a handy heuristic when we want to glean some quick insights into whether our customers like a page or not. The assumption is that a high bounce rate is reflective of a poorly performing page, as its contents have evidently not encouraged a reader to explore the site further.
Why is it misunderstood?
Bounce rate is at times both misunderstood and misinterpreted.
A ‘bounce’ occurs when a user views one page on a site and a single request is sent to the Analytics server. Therefore, we can say that Google uses the quantity of engagement hits to classify a bounced session. One request = bounced; more than one request to the server = not bounced.
This can be problematic, given that any interaction will preclude that session from counting as a bounce. Some pages contain auto-play videos, for example. If the start of a video is tracked as an event, this will trigger an engagement hit. Even if the user exits the page immediately, they will still not be counted as a bounced visit.
Equally, a user may visit the page, find the exact information they wanted (a phone number or address, for example), and then carry out their next engagement with the brand offline. Their session could be timed out (this happens by default after 30 minutes on GA and then restarts), before they engage further with the site. In either example, this will be counted as a bounced visit.
That has an impact on the Average Time on Page calculations, of course. A bounced visit has a duration of zero, as Google calculates this based on the time between visiting one page and the next – meaning that single-page visits, and the last page in any given session, will have zero Time on Page.
Advances in user-based tracking (as opposed to cookie-based) and integration with offline data sources provide cause for optimism; but for now, most businesses using GA will see a bounce rate metric that is not wholly accurate.
All of this should start to reveal why and how bounce rate can be misinterpreted.
First of all, a high bounce rate not always a problem. Often, users find what they want by viewing one page and this could actually be a sign of a high-performing page. This occurs when people want very specific information, but can also occur when they visit a site to read a blog post.
Moreover, a very low bounce rate does not necessarily mean a page is performing well. It may suggest that users have to dig deeper to get the information they want, or that they quickly skim the page and move on to other content.
With the growing impact of RankBrain, SEOs will understandably view bounce rate as a potential ranking factor. However, it has to be placed in a much wider context before we can assume it has a positive or negative impact on rankings.
How can marketers avoid this?
Marketers should never view bounce rate as a measure of page quality in isolation. There really is no such thing as a ‘good’ or ‘bad’ bounce rate in a universal sense, but when combined with other metrics we can get a clearer sense of whether a page is doing its job well.
Tools like Scroll Depth are great for this, as they allow us to see in more detail how a consumer has interacted with our content.
We can also make use of Google Tag Manager to adapt the parameters for bounce rate and state, for example, that any user that spends longer than 30 seconds on the page should be discounted as a bounce. This is useful for publishers who tend to receive a lot of traffic from people who read one post and then go elsewhere.
This is commonly known as ‘adjusted bounce rate’ and it helps marketers get a more accurate view of content interactions. Glenn Gabe wrote a tutorial for Search Engine Watch on how to implement this: How to implement Adjusted Bounce Rate (ABR) via Google Tag Manager.
Bounce rate can be a very useful metric, but it needs a bit of tweaking for each site before it is truly fit for purpose.
What is it?
Channels are sources of traffic and they reflect the ways that users find your website. As a result, this is one of the first areas marketers will check in their GA dashboard to evaluate the performance of their different activities.
There are many ways that people can find websites, so we tend to group these channels together to provide a simpler overview of traffic.
Google provides default channel groupings out of the box, which will typically look as follows:
You can find this by navigating this path: Admin > Channel Settings > Channel Grouping.
Anything that sits outside of these sources will fall into the disconcertingly vague ‘(Other)’ bucket.
From Google’s perspective, this is a reasonably accurate portrayal of the state of affairs for most websites. However, this is applied with broad brush strokes out of necessity and it shapes how marketers interpret very valuable data.
Why is it misunderstood?
Default channel groupings are often misunderstood in the sense that they are taken as the best solution without conducting further investigation.
Vague classifications like ‘Social’ and ‘Referral’ ignore the varying purposes of the media that fall under these umbrellas. In the case of the former, we would at the very least want to split out our paid and organic social media efforts and treat them separately.
We want channel groupings to provide a general overview, but perhaps it needn’t be quite so general.
Leaving these groupings as they are has a significant impact, particularly when it comes to the eternal riddle of channel attribution. If we want to understand which channels have contributed to conversions, we need to have our channels correctly defined as a basic starting point.
How can marketers avoid this?
Make use of custom channel groupings that accurately reflect your marketing activities and the experience your consumers will have with your brand online. It is often helpful to group campaigns by their purpose; prospecting and remarketing, for example.
Custom channel groupings are a great option because they alter the display of data, rather than how it is filtered. You can modify the default channel groupings if you feel confident about the changes you plan to make, but this will permanently affect how data is processed in your account. Always add a new view to test these updates before committing them to your main account dashboard.
For most, custom channel groupings will be more than sufficient.
Through the use of regular expressions (known commonly as regex), marketers can set up new rules. Regex is not a particularly complex language to learn and follows a clear logic, but it does take a little bit of getting used to. You can find a great introductory guide to regex expressions here. These rules will allow you to create new channels or alter the pre-defined groupings Google provides.
Your new channel groupings will be applied to historical data, so you can easily assess the difference they make. These alterations will prove especially invaluable when you compare attribution models within GA.
What are they?
The array of segmentation options available is undoubtedly one of Google Analytics’ most powerful advantages. Customer segments allow us to view very specific behavioral patterns across demographics, territories and devices, among many others. We can also import segments created by other users, so there is a truly vast selection of options at our disposal.
By clicking on ‘+ New Segment’ within your GA reports, you will be taken to the Segment Builder interface:
Google provides a very handy preview tool that shows us what percentage of our audience is included under the terms we are in the process of defining. This will always begin at 100% and decrease as our rules start to hone in on particular metrics and/or dimensions:
This is where it starts to get tricky, as the segment builder can start to produce unexpected results. A seemingly sound set of rules can return a preview of 0% of total users, much to the marketer’s chagrin.
Why are they misunderstood?
The underlying logic in how Google processes and interprets data can be complex, even inconsistent at times.
When we set up a set of rules, they will be treated sequentially. A session will need to pass the first condition in order to reach the second round, and so on. We therefore need to consider very carefully how we want our experiments to run if we want them to be sound.
To take a working example, if I want to see how many sessions have included a visit to my homepage and to my blog, I can set up an advanced condition to cover this. I filter by sessions and include a condition for Page exactly matching the blog URL and Page exactly matching the homepage:
This creates what seems like a valid segment in the preview.
Logically, I should be able to take this up one level to see what proportion of users meet these conditions. Within the GA hierarchy, users are a superset of sessions, which are in turn a superset of hits.
However, this is not how things play out in reality. Just by switching the filter from ‘Sessions’ to ‘Users’, the segment is rendered invalid:
Why does this occur?
Google uses a different logic to calculate each, which can of course be quite confusing.
In the former example, Google’s logic allowed room for interpretation, so the AND condition loosely meant that a session could include visits to each page at different times. As such, some sessions meet the requisite conditions.
In the latter example, the AND rule means that a user must meet both conditions simultaneously to be included. This is impossible, as they cannot be on two pages at once.
We can still arrive at the same results, but we cannot do so using the AND condition. By removing the second condition and adding a new filter in its place, we can see the same results for Users that we received for Sessions:
In other words, we need to be very specific about what exactly we mean if we want accurate results from segments created for users, but not quite so explicit with sessions.
It is better to err on the safe side overall, as the logic employed for Users was rolled out more recently and it is here to stay. The complexity is multiplied when a segment contains filters for users and for sessions, so it is essential to maintain some consistency in how you set these up.
How can marketers avoid this?
By understanding the hierarchy of User – Session – Hit, we can start to unpick Google’s inner workings. If we can grasp this idea, it is possible to debug custom segments that don’t work as expected.
The same idea applies to metrics and dimensions too, where some pairings logically cannot be met within the same segment. Google provides a very comprehensive view of which pairings will and will not work together which is worth checking out.
Although it does involve quite a bit of trial and error at first, custom segments are worth the effort and remain one of the most powerful tools at the analyst’s disposal.
Beginning in 2011, search marketers began to lose visibility over the organic keywords that consumers were using to find their websites, as Google gradually switched all of its searches over to secure search using HTTPS.
As it did so, the organic keyword data available to marketers in Google Analytics, and other analytics platforms, slowly became replaced by “(not provided)”. By 2014, the (not provided) issue was estimated to impact 80-90% of organic traffic, representing a massive loss in visibility for search marketers and website owners.
Marketers have gradually adjusted to the situation, and most have developed rough workarounds or ways of guessing what searches are bringing customers to their site. Even so, there’s no denying that having complete visibility over organic keyword data once more would have a massive impact on the search industry – as well as benefits for SEO.
One company believes that it has found the key to unlocking “(not provided)” keyword data. We spoke to Daniel Schmeh, MD and CTO at Keyword Hero, a start-up which has set out to solve the issue of “(not provided)”, and ‘Wizard of Moz’ Rand Fishkin, about how “(not provided)” is still impacting the search industry in 2017, and what a world without it might look like.
Content produced in association with Keyword Hero.
“(not provided)” in Google Analytics: How does it impact SEO?
“The “(not provided)” keyword data issue is caused by Google the search engine, so that no analytics program, Google Analytics included, can get the data directly,” explains Rand Fishkin, founder and former CEO of Moz.
“Google used to pass a referrer string when you performed a web search with them that would tell you – ‘This person searched for “red shoes” and then they clicked on your website’. Then you would know that when people searched for “red shoes”, here’s the behavior they showed on your website, and you could buy ads against that, or choose how to serve them better, maybe by highlighting the red shoes on the page better when they land there – all sorts of things.”
“You could also do analytics to understand whether visitors for that search were converting on your website, or whether they were having a good experience – those kinds of things.
“But Google began to take that away around 2011, and their reasoning behind it was to protect user privacy. That was quickly debunked, however, by folks in the industry, because Google provides that data with great accuracy if you choose to buy ads with them. So there’s obviously a huge conflict of interest there.
“I think the assumption at this point is that it’s just Google throwing their weight around and being the behemoth that they can be, and saying, ‘We don’t want to provide this data because it’s too valuable and useful to potential competitors, and people who have the potential to own a lot of the search ranking real estate and have too good of an idea of what patterns are going on.
“I think Google is worried about the quality and quantity of data that could be received through organic search – they’d prefer that marketers spend money on advertising with Google if they want that information.”
Where Google goes, its closest competitors are sure to follow, and Bing and Yandex soon followed suit. By 2013, the search industry was experiencing a near-total eclipse of visibility over organic keyword data, and found itself having to simply deal with the consequences.
“At this point, most SEOs use the data of which page received the visit from Google, and then try to reverse-engineer it: what keywords does that page rank for? Based on those two points, you can sort of triangulate the value you’re getting from visitors from those keywords to this page,” says Fishkin.
However, data analysis and processing have come a long way since 2011, or even 2013. One start-up believes that it has found the key to unlocking “(not provided)” keyword data and giving marketers back visibility over their organic keywords.
How to unlock “(not provided)” keywords in Google Analytics
“I started out as a SEO, first in a publishing company and later in ecommerce companies,” says Daniel Schmeh, MD and CTO of SEO and search marketing tool Keyword Hero, which aims to provide a solution to “(not provided)” in Google Analytics. “I then got into PPC marketing, building self-learning bid management tools, before finally moving into data science.
“So I have a pretty broad understanding of the industry and ecosystem, and was always aware of the “(not provided)” problem.
“When we then started buying billions of data points from browser extensions for another project that I was working on, I thought that this must be solvable – more as an interesting problem to work on than a product that we wanted to sell.”
Essentially, Schmeh explains, solving the problem of “(not provided)” is a matter of getting access to the data and engineering around it. Keyword Hero uses a wide range of data sources to deduce the organic keywords hidden behind the screen of “(not provided)”.
“In the first step, the Hero fetches all our users’ URLs,” says Schmeh. “We then use rank monitoring services – mainly other SEO tools and crawlers – as well as what we call “cognitive services” – among them Google Trends, Bing Cognitive Services, Wikipedia’s API – and Google’s search console, to compute a long list of possible keywords per URL, and a first estimate of their likelihood.
“All these results are then tested against real, hard data that we buy from browser extensions.
“This info will be looped back to the initial deep learning algorithm, using a variety of mathematical concepts.”
Ultimately, the process used by Keyword Hero to obtain organic keyword data is still guesswork, but very advanced guesswork.
“All in all, the results are pretty good: in 50 – 60% of all sessions, we attribute keywords with 100% certainty,” says Schmeh.
“For the remainder, at least 83% certainty is needed, otherwise they’ll stay (not provided). For most of our customers, 94% of all sessions are matched, though in some cases we need a few weeks to get to this matching rate.”
If the issue of “(not provided)” organic keywords has been around since 2011, why has it taken us this long to find a solution that works? Schmeh believes that Keyword Hero has two key advantages: One, they take a scientific approach to search, and two, they have much greater data processing powers compared with six years ago.
“We have a very scientific approach to SEO,” he says.
“We have a small team of world-class experts, mostly from Fraunhofer Institute of Technology, that know how to make sense of large amounts of data. Our background in SEO and the fact that we have access to vast amounts of data points from browser extensions allowed us to think about this as more of a data science problem, which it ultimately is.
“Processing the information – the algorithm and its functionalities – would have worked back in 2011, too, but the limiting factor is our capability to work with these extremely large amounts of data. Just uploading the information back into our customers’ accounts would take 13 hours on AWS [Amazon Web Services] largest instance, the X1 – something we could never afford.
“So we had to find other cloud solutions – ending up with things that didn’t exist even a year ago.”
A world without “(not provided)”: How could unlocking organic keyword data transform SEO?
If marketers and website owners could regain visibility over their organic keywords, this would obviously be a huge help to their efforts in optimizing for search and planning a commercial strategy.
But Rand Fishkin also believes it would have two much more wide-reaching benefits: it would help to prove the worth of organic SEO, and would ultimately lead to a better user experience and a better web.
“Because SEO has such a difficult time proving attribution, it doesn’t get counted and therefore businesses don’t invest in it the way they would if they could show that direct connection to revenue,” says Fishkin. “So it would help prove the value, which means that SEO could get budget.
“I think the thing Google is most afraid of is that some people would see that they rank organically well enough for some keywords they’re bidding on in AdWords, and ultimately decide not to bid anymore.
“This would cause Google to lose revenue – but of course, many of these websites would save a lot of money.”
And in this utopian world of keyword visibility, marketers could channel that revenue into better targeting the consumers whose behavior they would now have much higher-quality insights into.
“I think you would see more personalization and customization on websites – so for example, earlier I mentioned a search for ‘red shoes’ – if I’m an ecommerce website, and I see that someone has searched for ‘red shoes’, I might actually highlight that text on the page, or I might dynamically change the navigation so that I had shades of red inside my product range that I helped people discover.
“If businesses could personalize their content based on the search, it could create an improved user experience and user performance: longer time on site, lower bounce rate, higher engagement, higher conversion rate. It would absolutely be better for users.
“The other thing I think you’d see people doing is optimizing their content efforts around keywords that bring valuable visitors. As more and more websites optimized for their unique search audience, you would generally get a better web – some people are going to do a great job for ‘red shoes’, others for ‘scarlet sandals’, and others for ‘burgundy sneakers’. And as a result, we would have everyone building toward what their unique value proposition is.”
Daniel Schmeh adds that unlocking “(not provided)” keyword data has the ability to make SEO less about guesswork and more substantiated in numbers and hard facts.
“Just seeing simple things, like how users convert that use your brand name in their search phrase versus those who don’t, has huge impact on our customers,” he says. “We’ve had multiple people telling us that they have based important business decisions on the data.
“Seeing thousands of keywords again is very powerful for the more sophisticated, data-driven user, who is able to derive meaningful insights; but we’d really like the Keyword Hero to become a standard tool. So we’re working hard to make this keyword data accessible and actionable for all of our users, and will soon be offering features like keyword clustering – all through their Google Analytics interface.”
To find out more about how to unlock your “(not provided)” keywords in Google Analytics, visit the Keyword Hero website.
Retrieving the Google Analytics metrics you care about is now as easy as asking for them using natural language.
The post Get the Google Analytics Data You Need Using Natural Language Queries by @MattGSouthern appeared first on Search Engine Journal.