How we maintain both HTTP and HTTPS mirrorsSeptember 30, 2020
GitHub Pages is an awesome feature that came out a few years ago on GitHub to allow for basic, static-file HTML websites to be hosted free of charge on GitHub. It can be a bit confusing to understand in the beginning, because you must connect one of your repos to be used for a given domain.
For example, our littlebizzy/slickstack repo is what powers the https://mirrors.slickstack.io live website, a subdomain. But when you look at that repo on GitHub it is rather large, because it also contains all the scripts and files for the SlickStack project itself. In other words, you can use a repo for both a software project AND also a GitHub Pages website… or not, if you prefer.
Anyway, in our case, we are using GitHub Pages for our public mirrors server. This allows any Ubuntu server that wants to install SlickStack to see all the source files publicly to make things very transparent… these files are pulled via SSH (wget) but can also be browsed on the web too.
The problem is that wget over HTTPS creates a lot of challenges with making scripts both secure and simple, so most “mirrors” servers are still using HTTP to this day.
So for our scripts we wanted to support HTTP, but for SEO reasons we wanted to use HTTPS. What a dilemna!
The answer was to allow BOTH protocols to load, which is actually the default in GitHub Pages settings… however, instead of enabling the HTTPS redirect option in GitHub settings, we simply added a canonical meta tag in the header using Jekyll SEO to ensure the HTTPS version was set as the “canonical” version of GitHub Pages site.
Yes, this a rather janky fix, but is it still 100% understood by Google robots, while making sure wget can access HTTP version and web/search engines prioritize the HTTPS version.
In addition we added the “mirrors.slickstack.io” domain to our Google GSC panel, and submitted the HTTPS version of the sitemap: https://mirrors.slickstack.io/sitemap.xml … because sitemaps in GSC greatly affect how Google bots prioritize your indexing, we removed the prior sitemap which was the HTTP version so that Google understood our priority was the HTTPS version of the site. We then re-submitted for indexing all the HTTP versions of the subdomain that still existed on Google serps to force the Google bots to recrawl the content and discover the canonical meta tag in the headers, which eventually resulted in the the HTTPS version being indexed into Google, and the non-HTTPS versions being removed from Google search results. Done!