Vercel protected environments and Sitecore Search
Sitecore SearchVercelXM CloudNextJSSEOversioning everythingSitecoreCecSearchModule
| Reading Time: 6 Minutes
2024-09-07
Vercel is a really nice offering for hosting NextJs based solutions. It takes care of building and deployment with an optimized infrastructure with edge and compute region so you can get full benefit of Static Site Generation (SSG), Incremental Static Regeneration (ISR) and still support Server-side Generation (SSR)1. With the out of the box integration to multiple Source Control providers whenever you push a commit to eg. GitHub, Vercel will detect this, build your solution, deploy and clear edge caches, etc.
With the Sitecore partnership you can have it offered through your normal license so you only have one invoice and hereby it is also open in a simpler way for enterprises that can have more complicated policies for introducing new vendors and platforms.
For professional software development it is essential to have proper tests and Vercel support this. You can out of the box have preview environments, so whenever someone is working on a feature branch, it is actually available on a url and you can test that specific change isolated, catch any bugs early in the process and help you have a better quality and faster time to market. You can also assign domains to specific branches for a more traditional test and release process.
Multiple environments considerations
No matter how you have your environments it is important to consider availability for the public.
From a SEO perspective. It is crucial that you are not creating duplicate content by having identical pages in different urls. One workaround is to specify the canonical url on each page to the be the production url, hereby even when a search crawler comes by your pages it will use your production urls in indexes and search results. However, there is a risk that you test pages will affect your production results with this method. Another workaround is to prevent crawling of the site. This can be done with robots.txt (and make sure that there are no links on the public web to your test site) or specify ``noindex`ยด in meta tags or http header. However, those methods often introduces code and logic in your site implementation that should only be running in certain (non-production) environments and hereby there is a risk of bugs and differences when released to production.
However, it might also be that you don’t want any outsiders to see the test site with the content and revolutionary features and messages that you are preparing, even if they had or could guess the url. Traditionally this was achieved with having test servers that was only reachable while on the corporate network either internally or through VPN. With the rise of cloud solutions a lot have worked with IP Whitelisting but the rise of remote and hybrid work has more or less removed this option or at least made it impossible to manage on a larger scale. Then we are left with some kind of authentication mechanism to get access to the site, and again the risk of having implementation differences across environments that can hide bugs.
To have the same code base and code paths on your deployed environments you need to solve authentication at the infrastructure level (like the IP Whitelisting). With Azure Frontdoor, Cloudflare or similar CDN providers you can add http headers from the infrastructure level for eg. meta tags. For authentication Cloudflare have the Zero Trust offering and Azure App Service you use Easy Auth2
Protect environments with Vercel
Luckily, Vercel can offer protection of environments and hereby achieve this at infrastructure level. You can either have it protected to only allow the users in Vercel to have access (and you can add users with Viewer role without any additional cost) or you can use a password so Vercel will present an authentication page for the users, before they can proceed and see your actual site.
Handling integrations when using environment protection
When your test environments are protected and there are any integrations that need your site, this needs to be handled, eg. if the pages are indexed for an internal search engine such as Sitecore Search, this still needs access to the site.
When using Vercel, when environment protection is enabled there is also a bypass mechanism that can be used for integrations.
Configure Sitecore Search to use Vercel Protection Bypass
If you need to configure Sitecore Search after applying environment protection depends on how your content are being indexed, hereby which source you are using:
- API Push If you are using the ingestion API directly, then there are no changes, Sitecore Search endpoint is still protected with the API keys you are using and hereby your existing logic will continue to work as before. If your custom logic makes requests to the site you will need to authorize those requests.
- API Crawler If you are using APIs deployed to your protected environment, then you will need to configure the crawler.
- Sitemap crawler access your site and hereby it needs access to your environment, and you need to configure the crawler.
Let’s have a look at the individual configurations
For API Crawler it is ok simple, the trigger needs to be configured with the HTTP Header. If you have any Request Extractor, you need to also take care of the headers here.
For the Sitemap Web Crawler it is a bit more complicated as the trigger and following requests are handled differently. Initially I expected that we should configure Browser Authentication to let Sitecore Search authenticate similar as an ordinary user. It is actually way simpler to use the bypass protection and after a hint from Sitecore Support we got this sorted out.
The requests to other sitemaps (when using Sitemap Index) or individual pages are handled by the Web Crawler internally. In the Web Crawler Settings we can specify headers that will be used in those following requests.
However, those http headers are not added to the initial request for the sitemap. Furthermore, for the initial Sitemap request, the url to the sitemap cannot be configured with an additional HTTP header. However, Vercel also supports the key specified as a query string parameter, so this can be done for the Sitemap url
After this, the crawler should be possible to access and index your site so you can still use content from your test site and verify all of your Sitecore Search implementation.
UPDATE: using automation for Sitecore Search and environment protection
As you might know, I have created SitecoreCecSearchModule a Powershell module that allows interaction with Sitecore Search and serialize configuration to disk and even push configurations to Sitecore Search based on files.
Having secrets such as the Vercel Protection Bypass key is not really something we want. Similar we might have different keys for different environments, and probably not a key for production, etc.
To handle those issues, I have now updated the module so that it can add and remove querystrings for webcrawler trigger and http headers so the steps mentioned above can be handled automatically. To simplify things there is also added Add-CecConnectorVercelBypassProtection
that only need the actual secret as parameter.
When using the aggregated method Invoke-GetAndWriteAllCecConfiguration
it will remove Vercel Bypass querystring and headers before writing to disk, so the secret is not exposed. To read connectors from disk, add a secret, and push to Sitecore Search (if you dare this!) you can run:
|
|
Those methods are available from SitecoreCecSearchModule version 0.3.0