Custom Driver Command Executor

March 10, 2022

Selenium 4 introduced the ability to use browser specific features in Remote sessions, but in .NET it was especially onerous. A new class in Selenium 4.1 makes this functionality much easier to use.

Simon Stewart once told me something along the lines of, “the best way to get a developer to help with your code is to implement it poorly.” It was in reference to putting more care into the user-facing API than its implementation, because the implementation can be fixed more easily; but I regularly see things that remind me of this sentiment. For instance, when I wrote examples for executing browser specific functionality with a Remote WebDriver as part of Sauce Labs Selenium 4 Documentation, the ones in .NET were, well, ugly. The Selenium code wasn’t designed to do what I was making it do, but it worked, and I was just happy that I got it figured out.

When Jim Evans, who maintains the .NET code, saw what I had done with it though, he was (justifiably) appalled. This prompted him to create a much more elegant solution in Selenium 4.1, which makes working with this great feature much easier.

Let’s start with a quick rundown of what this functionality is and why it’s important. Browser vendors implement special features for their drivers that only work for their browser. Selenium provides access to these features with methods on the local driver classes. Traditionally, local driver classes have been implemented as subclasses of the Remote driver class. So when using a Remote driver, even though the browser on the remote machine would recognize the special commands, the superclass does not have access to the subclass methods that would make it work.

Ruby addresses this problem by no longer subclassing Remote::Driver, and dynamically pulling in the methods at runtime with metaprogramming. Java implements this functionality with an Augmenter class using Google’s AutoService annotation with a ServiceLoader. Which is to say, it pretty much uses magic. Most of the time it is good to require the user to be explicit when using an implementation that isn’t obvious, so I tend to prefer .NET’s approach here for specifying that the feature is being added to the superclass, even if it is a little more verbose. The Java Augmenter class and its usage in the RemoteWebDriverBuilder class is really interesting, though, so I do want to write an article about that at some point.

So this is the C# code I wrote for Selenium 4.0 to take a full page screenshot in a Remote session using Firefox:

IHasCommandExecutor hasCommandExecutor = Driver as IHasCommandExecutor;
var addFullPageScreenshotCommandInfo = new HttpCommandInfo(HttpCommandInfo.GetCommand, 
    "/session/{sessionId}/moz/screenshot/full");
hasCommandExecutor.CommandExecutor.TryAddCommand("fullPageScreenshot", addFullPageScreenshotCommandInfo);

SessionId sessionId = ((RemoteWebDriver)Driver).SessionId;
var fullPageScreenshotCommand = new Command(sessionId, "fullPageScreenshot", null);

Driver.Navigate().GoToUrl("https://www.saucedemo.com/v1/inventory.html");
var screenshotResponse = hasCommandExecutor.CommandExecutor.Execute(fullPageScreenshotCommand);
string base64 = screenshotResponse.Value.ToString();
Screenshot image = new Screenshot(base64);

var parentFullName = Directory.GetParent(Environment.CurrentDirectory)?.Parent?.Parent?.FullName;
image.SaveAsFile(parentFullName + "/Selenium4/Resources/FirefoxFullPageScreenshot.png", ScreenshotImageFormat.Png);

See full example on GitHub

The steps for this are complicated:

Find the W3C specified endpoints for the desired command
Create HttpCommandInfo instance with the endpoint information
Add the command to the Driver via CommandExecutor using IHasCommandExecutor interface
Find the W3C specified payload parameters for the command
Add these parameters to a Dictionary
Get the Driver’s sessionId
Create a Command instance with sessionId and parameters
Execute that command with the Driver via CommandExecutor using IHasCommandExecutor interface

You can see in Selenium 4.1 how much easier this code is:

var customCommandDriver = _driver as ICustomDriverCommandExecutor;
customCommandDriver.RegisterCustomDriverCommands(FirefoxDriver.CustomCommandDefinitions);

_driver.Navigate().GoToUrl("https://www.selenium.dev/");

const Dictionary<string, object> parameters = null;

var screenshotResponse = customCommandDriver
    .ExecuteCustomDriverCommand(FirefoxDriver.GetFullPageScreenshotCommand, parameters);

SaveScreenshot((string) screenshotResponse);

See full example on GitHub

It has half as many steps:

Register the custom driver commands for the given driver using ICustomDriverCommandExecutor interface
Find the W3C specified payload parameters for the desired command
Add these parameters to a Dictionary
Execute custom driver command using ICustomDriverCommandExecutor interface

The trickiest part is finding the correct payload parameters for the different commands, so take a look at examples for other browser specific methods that can now be used with a Remote driver:

Note: Firefox Only

var customCommandDriver = _driver as ICustomDriverCommandExecutor;
customCommandDriver.RegisterCustomDriverCommands(FirefoxDriver.CustomCommandDefinitions);

var parentFullName = Directory.GetParent(Environment.CurrentDirectory)?.Parent?.Parent?.FullName;
var localFile = parentFullName + "/Resources/ninja_saucebot-1.0-an+fx.xpi";
var extensionByteArray = File.ReadAllBytes(localFile);
var encodedExtension = Convert.ToBase64String(extensionByteArray);

var installAddon = new Dictionary<string, object> { { "addon", encodedExtension } };
var id = (string)customCommandDriver.ExecuteCustomDriverCommand(FirefoxDriver.InstallAddOnCommand, installAddon);

var removeAddon = new Dictionary<string, object> { { "id", id } };
customCommandDriver.ExecuteCustomDriverCommand(FirefoxDriver.UninstallAddOnCommand, removeAddon);

See full example on GitHub

Note: Chromium Only

var customCommandDriver = _driver as ICustomDriverCommandExecutor;
customCommandDriver.RegisterCustomDriverCommands(ChromeDriver.CustomCommandDefinitions);

var networkConditions = new ChromiumNetworkConditions { IsOffline = true };
var offlineNetwork = new Dictionary<string, object> { { "network_conditions", networkConditions } };
customCommandDriver.ExecuteCustomDriverCommand(ChromiumDriver.SetNetworkConditionsCommand, offlineNetwork);

customCommandDriver.ExecuteCustomDriverCommand(ChromiumDriver.DeleteNetworkConditionsCommand, null);

networkConditions = new ChromiumNetworkConditions
{
    Latency = TimeSpan.FromSeconds(1),
    DownloadThroughput = 50000,
    UploadThroughput = 50000
};
var limitedNetwork = new Dictionary<string, object> { { "network_conditions", networkConditions } };
customCommandDriver.ExecuteCustomDriverCommand(ChromiumDriver.SetNetworkConditionsCommand, limitedNetwork);

See full example on GitHub

Note: Firefox Only

var customCommandDriver = _driver as ICustomDriverCommandExecutor;
customCommandDriver.RegisterCustomDriverCommands(FirefoxDriver.CustomCommandDefinitions);

var chromePayload = new Dictionary<string, object> { { "context", "chrome" } };
customCommandDriver.ExecuteCustomDriverCommand(FirefoxDriver.SetContextCommand, chromePayload);

var js = (IJavaScriptExecutor) _driver;
js.ExecuteScript("Services.prefs.setStringPref('intl.accept_languages', 'de-DE')");

var contentPayload = new Dictionary<string, object> { { "context", "content" } };
customCommandDriver.ExecuteCustomDriverCommand(FirefoxDriver.SetContextCommand, contentPayload);

See full example on GitHub

If you aren’t already taking advantage of these new features in your remote tests, check them out. If you started using the code examples I created in October (or stole them for your company’s own documentation without attribution — you know who you are), please update to this.

Follow me if you found this article interesting,
or answer one of these questions in the comments or on Twitter: