How To Use Screaming Frog To Identify Pesky Mixed Content Issues
If you’re unaware, Google’s looking to make the web more secure, therefore, they’re encouraging websites to switch to https and in return they’ll receive a “slight” ranking boost in their search results – Only recently Google announced that Google chrome will be soon rendering all non-secure pages in big red “not secure” writing in their browsers .. so yes, it’s important to hop on board the https train sometime soon.
As more and more webmasters are deciding to switch to always-ssl (https), some aren’t deploying the protocol correctly, therefore Google cannot crawl and index the content properly – this results in a loss of rankings and visitors.
I’ve seen a few cases that involve webmasters not fixing mixed-content issues throughout their site, therefore even though https has been deployed, the page will still render non-secure as it’s using both HTTP and HTTPs elements in its content.
To try and resolve this, I thought I’d drop by and create a blog post of how you can use screaming frog to identify mixed content issues.
Run down of Mixed Content Warnings
If you still yet haven’t got to grips of the fundamentals of mixed content, what it is and how it happens, mixed content occurs when a browser attempts to load a page over an HTTPS connection, however, due to “some” resources on the page being both HTTP and HTTPs (including images, videos, style sheets, scripts and more) it, therefore, renders the page over a HTTP connection as it’s vulnerable to an attack.
Most modern browsers will display an icon that reflects that the page hasn’t been loaded over a secure connection and this may put users off purchasing a product or continuing their journey on the site.
Fixing Mixed Content Warnings
Mixed content warnings are happening because the element is calling an image, style sheet, video or whatever for the matter over a HTTP protocol. Therefore the link will look something like the following:
I love goats okay. Back to topic, in order to resolve this issue we have two solutions that’ll work, we can either call the image by specifying a ‘https’ protocol, therefore we can use the following:
OR, we can use a relative URL. A relative URL doesn’t explicitly specify the protocol, it doesn’t have HTTP or HTTPs at the front, instead, the web browser will assume that you are looking to reference a page that’s internal and it assumes it’s on the same site. Therefore we can use the following:
If you’re launching a new site, it’s a good idea to use relative URL’s from the site, so when you do switch to https, the web browser by default will use the https protocol instead of HTTP when it’s deployed.
Finding Mixed Content Warnings
Great – we know how to fix these warnings, but how do we find them? You have two options, you can either manually go through each individual page on your site and see if the web browser renders it as non-secure, or you can use a tool like Screaming frog to automatically crawl your site and do all the hard work for you, what sounds better?
Ha, thought so, nothing beats the legendary powers of the frog. The automated process using Screaming frog is very, very useful for big sites, it will go through and inspect every link, every page and every environment on your site and it will list out the mixed content errors it finds for you in a spreadsheet.
If you’re a lucky boy like me, you won’t have to fix these yourself and you can in fact send them over to another department to get resolved. If you’re interested, you can purchase Screaming frog for a relatively cheap price, it’s only £149.00 GBP a year, think of how much time it’s going to save you!
- Once you have your own copy of the frog, launch it and you should see it’s lovely GUI, it looks fairly complicated, but for good reasons.
- At the top of the page between the two navigation structures, you’ll see a field that states ‘Enter URL to spider’ – this is the URL that you’d like to conduct the crawl on. If you’re looking to find mixed content errors on www.google.com, that’s the URL you’ll need to enter.
- Once the URL has been entered, click Start. You’ll then see the magic happen, the crawler will go and navigate through every page on the site, collecting data as it crawls and it will then display that information in it’s GUI. The crawler will take time to run, however, once it’s finished you’ll be bombarded with lots of useful data that’s going to save you a lot of time and possibly make you money.
- Once the crawler has finished, the easiest way to export a list of all of the URL’s that have mixed content errors and aren’t rendering themselves as https is to click ‘Reports’ in the main-navigation structure at the top of the page. Then click ‘Insecure Content’. Name your file and click save!
- We’ve got the spreadsheet, excellent, grab yourself a beer. Open the spreadsheet and you’ll see lots of fascinating data that’s going to improve your day. The data is separated into 8 different columns, ‘Type’, ‘Source’, ‘Destination’ , ‘ALT text’, ‘Anchor’, ‘Status Code’, ‘Status’ and ‘Follow’.
- Go through each of the URL’s in the ‘Source’ column and fix the mixed content issues, I wouldn’t worry to much about the ‘HREF’ as these don’t cause mixed content errors themselves, however I’d watch out primarily for the ‘IMG’ ones.
There you have it, you’re now a technical SEO wizard, how’d ya feel? You’re going to save your team a lot of time and effort, no need to go through and manually find pesky mixed content warnings!