If you’re familiar with SEO, you may have heard of Regex, but do you know what it is and how to use it to create comprehensive SEO reports? In this blog post, we’ll cover everything you need to know about Regex and how it can enhance your reporting. From understanding the basics of Regex to implementing it in your SEO strategy, this guide will provide you with all the information you need to improve your SEO reporting with Regex.
What is Regular Expression?
Regular expressions, or regex, are a set of characters that define a pattern. They are often used in SEO to identify patterns in webpages, such as URLs, page titles, and meta tags, as well as to find duplicates or broken links. They can also be used to clean up report data.
In order to make you understand better, I will start with a simpler regex example.
To filter the words adapter, adaptor from a list of words you can use
adapt[eo]r
This will filter out both the words.
Why regular expression is used in SEO
Regular expressions can be used in SEO to filter data in analytics, allowing you to quickly focus on the information you need. This can make the work of webmasters easier by enabling them to customize SEO reports to their specific requirements. For example, you can use Regular expressions to filter data in analytics by searching for specific keywords or phrases, which can help you identify trends or issues that need to be addressed.
Earlier I had mentioned about basic Regex example, But now I will give an example on how to use Regex in analytics.
Suppose you need to revamp a large website with different URL structure.
Let the URL structures be
Now as a SEO expert you need to identify the URLs with “_” in it and change the URL structure.
In order to do that we can use Regex filter.
I will explain you how.
Go to
Google Analytics<Behaviours > Site Content > All Pages
And in the right corner of the right-side column, just beside the search bar, click on the advanced.
Check if you have selected include, landing page from the dropdown menu, and in the third box select Matching Regex
Use this expression,
^[a-z0-9]+(?:_[a-z0-9]+)+$ |
And click Apply
That’s it, now you will get the URLs with underscore (_) in it.
Now for better understanding, let us dissect more.
Wild Card | Description |
^ | Start of the expression |
/ | A forward slash |
[a-z0-9]+ | One or more repetition of given characters |
(?: | A non-capture group. |
– | A hyphen |
[a-z0-9]+ | One or more repetition of given characters |
)+ | One or more repetition of previous group |
$ | End of the expression |
The graphical representation of the above regular expression
Image Courtesy: ihateregex.io
This is one of many examples to use Regex filters in Google Analytics.
There are many such Regex patters you can use to get a detailed report.
I will share some
Regex Expression | Description |
word1|word2|word3 | to get the pages with specific words in it. |
^/(folder1|folder2)/ | to filter the pages of only specific folders |
Regular expression Basics.
some Important Regex expressions you should know
Caret (^)
The special character caret (^)is the expression used at the starting of any expression. But not all caret in the expression is the same.
If ^ is in the [] it means not operator.
For example, /[^pqrs] matches anything except the characters in square brackets.
Dollar ($)
If caret represents the start of the expression, then dollar represents the end of the expression.
Square Brackets[]
any character inside the square bracket matches the expression.
For example, [abcdef] matches a,b,c,d,e,f.
dot (.)
dot is called wildcard character in regular expression as it matches any character.
For example.
[c.t] will match cat or cut or any 3-letter word with c and t at the end. if you want to add more characters in between use curly bracket,
with the same example. to add more characters,
[c.{1,2}t]it means any 1,2 characters followed by t.
Plus(+)
+ in regex patterns means one or more repetitions.
For example, the expression we discussed in the beginning.
^[a-z0-9]+(?:_[a-z0-9]+)+$
[a-z0-9]+ means there should be at least one or more repetition of characters in the string. To make it more clear there should be atleast one occurrence of any characters.
Star(*)
Star (*) is also similar to plus , but with a small difference, instead of one or more repetitions, star means zero or more repetition. So it means it is not necessary to have any characters.
Curly Brackets {}
Curly brackets in Regex will give you the number of times the preceding item is repeated.
For Example:
fo{2-4}d would match words foood, fooood,fooood but not fod.
Question Mark ?
Question mark represent zero or more occurrence of the proceedings. It is mainly used if you want to see the queries with spelling mistake.
Example:
Search Engine Optimi?ation will return both Search Engine Optimisation and Search Engine Optimization
(?: or non-capturing group
by documentation A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.
it means it creates a subgroup inside the group, but this matching cannot be used later in the pattern.
(?: these are non-capturing groups in Regex expression. This is useful when you don’t want specific portion in a sequence.
Conclusion
Regular expressions are a powerful tool in SEO that are often overlooked. Once you master the art of creating the right patterns and sequences, you can use them to generate detailed and customized data reports that can help inform a winning SEO strategy for your project.