How to Detect An Automated Browser – Hendrik Thurau Enterprises

Detecting automated browsers on the server side can be challenging since automation techniques are continuously evolving, and determined attackers may attempt to bypass detection methods. However, there are several techniques and approaches you can use to identify automated browser activity. Here are a few common methods:

User Agent Analysis: Analyze the User-Agent header sent by the browser in the HTTP request. Automated browsers or bots often use default or non-standard User-Agent strings that differ from those of regular browsers. However, keep in mind that User-Agent headers can be easily manipulated or spoofed by attackers.
JavaScript/Cookie Challenges: Use JavaScript challenges or cookie-based tests to detect automation. For example, you can employ techniques like CAPTCHA challenges or cookie-based tracking to identify whether the browser behaves like a real user.
Interaction Analysis: Monitor the user’s interaction patterns and behavior. Automated browsers typically exhibit different patterns, such as consistent and rapid requests, minimal time spent on pages, or lack of mouse movements or clicks. Unnatural or repetitive behavior, while human users show more random and varied interactions
IP Address Analysis: Analyze IP addresses to identify patterns associated with known automated activity or suspicious behavior, such as multiple requests originating from the same IP within a short time frame.
Machine Learning and Anomaly Detection: Utilize machine learning algorithms or anomaly detection techniques to detect patterns and anomalies in user behavior, helping to differentiate between human users and automated systems.
Request Rate Limiting: Implement rate limiting mechanisms to restrict the number of requests from a single IP address or user within a specific time period. Unusually high request rates from a single source may indicate automated activity.
Headless Browser Detection: Detect headless browsers, which are commonly used in automation, by looking for specific browser fingerprinting characteristics associated with headless environments, such as missing or incomplete browser features or inconsistent rendering behavior.
JavaScript/CSS Rendering Analysis: Analyze the behavior and rendering capabilities of the browser’s JavaScript and CSS engines. Automated browsers often have limited or non-standard JavaScript and CSS support, which can be used as indicators.
Browser Plugin Detection: Check for the presence of certain browser plugins commonly used by automation tools. For example, plugins like Selenium WebDriver or Puppeteer may be indicative of automated activity.
Referrer Analysis: Examine the referrer header to check if the browser arrived at your site through suspicious sources, such as direct URL requests or suspicious domains commonly associated with automation.
Hidden Field or Honeypot Detection: Include hidden form fields or honeypot fields in your forms and monitor if they are filled or interacted with. Automated bots often fill in all form fields, including hidden fields, while humans won’t see or interact with them.

It’s important to note that no single technique can provide foolproof detection, as attackers are constantly adapting their methods. Employing a combination of multiple detection mechanisms and regularly updating them based on observed patterns and emerging threats can enhance your ability to identify automated browser activity. However, striking a balance between effective detection and minimizing false positives is also crucial to ensure a smooth user experience. Additionally, consider implementing a layered approach that includes both client-side and server-side checks for better accuracy and security.

Example of techniques in Javascript

JavaScript code that demonstrates some of the techniques discussed earlier for detecting automated browsers:

				
					// Technique 1: User Agent Analysis
const userAgent = window.navigator.userAgent;
const isAutomatedUserAgent = /bot|crawl|spider|headless|puppeteer/i.test(userAgent);

// Technique 2: JavaScript/Cookie Challenges
const isAutomatedJavaScriptChallenge = performJavaScriptChallenge();
const isAutomatedCookieChallenge = performCookieChallenge();

// Technique 3: Interaction Analysis
const isAutomatedInteraction = performInteractionAnalysis();

// Technique 4: IP Address Analysis
const ipAddress = getIpAddress();
const isAutomatedIpAddress = detectAutomationByIpAddress(ipAddress);

// Technique 5: Machine Learning and Anomaly Detection
const isAutomatedMachineLearning = detectAutomatedBrowserMachineLearning();
const anomalies = detectAnomalies();
const threshold = 3; //use your own threshold, depending on your use case
const isAnormal = anomalies.length > threshold;

// Combine the results of different techniques
const isAutomated =
  isAutomatedUserAgent ||
  isAutomatedJavaScriptChallenge ||
  isAutomatedCookieChallenge ||
  isAutomatedInteraction ||
  isAutomatedIpAddress ||
  isAutomatedMachineLearning;

// Log the result
console.log("Is automated browser:", isAutomated);

JavaScript Challenge

				
					function performJavaScriptChallenge() {
  // Generate a random number
  const challengeNumber = Math.floor(Math.random() * 10);

  // Display the challenge to the user
  console.log("Challenge Number:", challengeNumber);

  // Wait for user input
  const userInput = prompt("Please enter the challenge number:");

  // Compare the user's input with the challenge number
  if (parseInt(userInput) === challengeNumber) {
    console.log("Challenge passed. Not an automated browser.");
    return false;
  } else {
    console.log("Challenge failed. Likely an automated browser.");
    return true;
  }
}

// Call the JavaScript challenge function
const isAutomatedJavaScriptChallenge = performJavaScriptChallenge();

In this example, a random number is generated as a challenge. The challenge number is displayed to the user, who needs to enter the correct number in order to pass the challenge. If the user’s input matches the challenge number, it is assumed that the browser is not automated. Otherwise, it is considered to be an automated browser.

Keep in mind that this is a simple example, and you can customize the challenge to suit your specific needs. You can make the challenge more complex by using various techniques like dynamically generated puzzles, time-based challenges, or other interactive tasks that require user input.

Cookie Challenge

				
					// Cookie Challenge
function performCookieChallenge() {
  // Generate a random token
  const challengeToken = generateChallengeToken();

  // Set the challenge token as a cookie
  document.cookie = `challenge=${challengeToken}; path=/`;

  // Wait for a short period
  awaitTimeout(2000);

  // Check if the challenge cookie is still present
  const cookieValue = getCookieValue("challenge");
  if (cookieValue === challengeToken) {
    console.log("Challenge passed. Not an automated browser.");
    return false;
  } else {
    console.log("Challenge failed. Likely an automated browser.");
    return true;
  }
}

// Helper function to generate a random token
function generateChallengeToken() {
  // Implement your logic to generate a unique challenge token
  // This can be a random string, hash, or any other method that suits your needs
  // For simplicity, let's generate a random 8-character token
  const characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
  let token = "";
  for (let i = 0; i < 8; i++) {
    token += characters.charAt(Math.floor(Math.random() * characters.length));
  }
  return token;
}

// Helper function to retrieve the value of a cookie
function getCookieValue(cookieName) {
  const cookies = document.cookie.split(";").map(cookie => cookie.trim());
  for (const cookie of cookies) {
    const [name, value] = cookie.split("=");
    if (name === cookieName) {
      return decodeURIComponent(value);
    }
  }
  return null;
}

// Helper function to introduce a delay using setTimeout
function awaitTimeout(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// Call the Cookie challenge function
const isAutomatedCookieChallenge = performCookieChallenge();

In this example, a random token is generated as a challenge. The challenge token is set as a cookie named “challenge” using document.cookie. After a short delay, the script checks if the challenge cookie is still present and has the same value. If the challenge cookie remains unchanged, it is assumed that the browser is not automated. Otherwise, it is considered to be an automated browser.

You can customize the challenge by modifying the generateChallengeToken() function to generate unique and secure tokens based on your requirements. Additionally, adjust the delay period in the awaitTimeout() function to provide enough time for the browser to handle cookies.

Interaction Analysis

				
					// Interaction Analysis
function performInteractionAnalysis() {
  const minMouseMovementDistance = 10; // Minimum distance threshold for mouse movements
  const minMouseClicks = 3; // Minimum number of mouse clicks

  let mouseMovementDistance = 0;
  let mouseClickCount = 0;

  function handleMouseMovement(event) {
    // Calculate the distance of mouse movement
    if (event.movementX || event.movementY) {
      mouseMovementDistance += Math.abs(event.movementX) + Math.abs(event.movementY);
    }
  }

  function handleMouseClick(event) {
    mouseClickCount++;
  }

  // Attach event listeners for mouse movement and click events
  document.addEventListener("mousemove", handleMouseMovement);
  document.addEventListener("click", handleMouseClick);

  // Wait for a certain duration or until a specific condition is met
  awaitTimeout(5000); // Wait for 5 seconds

  // Remove event listeners
  document.removeEventListener("mousemove", handleMouseMovement);
  document.removeEventListener("click", handleMouseClick);

  // Analyze the interaction data
  const isAutomatedInteraction =
    mouseMovementDistance < minMouseMovementDistance || mouseClickCount < minMouseClicks;

  if (isAutomatedInteraction) {
    console.log("Interaction analysis failed. Likely an automated browser.");
    return true;
  } else {
    console.log("Interaction analysis passed. Not an automated browser.");
    return false;
  }
}

// Helper function to introduce a delay using setTimeout
function awaitTimeout(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// Call the Interaction analysis function
const isAutomatedInteraction = performInteractionAnalysis();

In this example, the code tracks mouse movements and mouse clicks using event listeners for the mousemove and click events. It calculates the total distance of mouse movement and keeps a count of mouse clicks. After a specific duration (in this case, 5 seconds), the interaction data is analyzed to determine if it meets certain criteria.

The example sets minimum thresholds for mouse movement distance (minMouseMovementDistance) and mouse clicks (minMouseClicks). If the total mouse movement distance is below the threshold or the number of mouse clicks is less than the required minimum, it is assumed that the browser behavior indicates automation.

You can adjust the thresholds and add additional criteria based on your specific requirements. For example, you can consider factors like scroll events, keyboard input, or other types of user interactions.

IP Address Analysis

In JavaScript, you can obtain the IP address of a user by making a request to an external service or by utilizing WebRTC. Here are two common methods to retrieve the IP address in JavaScript:

Using an External Service (like ipify):

				
					async function getIpAddress() {
  try {
    const response = await fetch("https://api.ipify.org?format=json");
    const data = await response.json();
    const ipAddress = data.ip;
    console.log("IP Address:", ipAddress);
    return ipAddress;
  } catch (error) {
    console.error("Failed to retrieve IP address:", error);
    return null;
  }
}

// Call the getIpAddress function
getIpAddress();

In this example, we make an HTTP request to the ipify API to fetch the user’s IP address. The response is then parsed as JSON, and the IP address is extracted from the returned data.

Using WebRTC:

				
					function getIpAddress() {
  return new Promise((resolve, reject) => {
    const RTCPeerConnection = window.RTCPeerConnection || window.mozRTCPeerConnection || window.webkitRTCPeerConnection;
    if (!RTCPeerConnection) {
      reject("WebRTC not supported");
    }

    const rtcPeerConnection = new RTCPeerConnection({ iceServers: [] });
    rtcPeerConnection.createDataChannel("");

    rtcPeerConnection.onicecandidate = event => {
      if (event.candidate) {
        const ipAddress = event.candidate.ip || event.candidate.address || event.candidate.addresses[0];
        console.log("IP Address:", ipAddress);
        resolve(ipAddress);
      }
    };

    rtcPeerConnection.createOffer()
      .then(offer => rtcPeerConnection.setLocalDescription(offer))
      .catch(error => reject(error));
  });
}

// Call the getIpAddress function
getIpAddress();

In this method, we utilize the RTCPeerConnection API to gather ICE candidates. The IP address is extracted from the ICE candidate. Please note that this method relies on WebRTC and may not work in all browsers or under certain network configurations.

Keep in mind that both methods may be subject to limitations and may not always provide the user’s true IP address due to factors such as proxy servers or NAT. Additionally, the external service approach relies on a third-party API, which may have usage limits or require registration.

It’s important to consider the specific use case and privacy implications when retrieving and handling IP addresses.

Next we need to analyze the IP for abnormal source.

We can do this using the ipdata API to detect automation by analyzing specific fields related to proxy and VPN detection:

				
					async function detectAutomationByIpAddress(ipAddress) {
  const apiKey = "YOUR_API_KEY"; // Replace with your actual API key

  try {
    const response = await fetch(`https://api.ipdata.co/${ipAddress}?api-key=${apiKey}`);
    const data = await response.json();

    console.log("IP Address:", data.ip);
    console.log("Country:", data.country_name);
    console.log("Is Proxy:", data.is_proxy);
    console.log("Is VPN:", data.is_vpn);
    console.log("Is Tor Exit Node:", data.is_tor);

    if (data.is_proxy || data.is_vpn || data.is_tor) {
      console.log("Automation detected. IP address is associated with a proxy, VPN, or Tor exit node.");
      return true;
    } else {
      console.log("No automation detected. IP address is not associated with a proxy, VPN, or Tor exit node.");
      return false;
    }
  } catch (error) {
    console.error("Failed to analyze IP address:", error);
    return null;
  }
}

// Call the detectAutomationByIpAddress function with an IP address
detectAutomationByIpAddress("8.8.8.8"); // Replace with the desired IP address

In this example, we make a request to the ipdata API with a specific IP address. The API provides fields like is_proxy, is_vpn, and is_tor, which indicate if the IP address is associated with a proxy, VPN, or Tor exit node, respectively.

To use this example, you need to sign up for an API key at ipdata and replace "YOUR_API_KEY" with your actual API key.

By checking these fields in the API response, you can determine if the IP address is likely associated with an automated system or using a proxy or VPN service.

We can also use proxycheck.io :

				
					async function detectAutomationByIpAddress(ipAddress) {
  const proxyDetectionApiUrl = "https://proxycheck.io/v2/YOUR_API_KEY/" + ipAddress;
  // Replace "YOUR_API_KEY" with your actual API key from proxycheck.io

  try {
    const response = await fetch(proxyDetectionApiUrl);
    const data = await response.json();

    if (data.status === "ok" && data.proxy === "yes") {
      console.log("Automation detected. IP address is a proxy or VPN.");
      return true;
    } else {
      console.log("No automation detected. IP address is not a proxy or VPN.");
      return false;
    }
  } catch (error) {
    console.error("Failed to analyze IP address:", error);
    return null;
  }
}

// Call the detectAutomationByIpAddress function with an IP address
detectAutomationByIpAddress("8.8.8.8"); // Replace with the desired IP address

In this example, we use the proxycheck.io API to check if an IP address is associated with a proxy or VPN service. You need to sign up for an API key at proxycheck.io and replace "YOUR_API_KEY" with your actual API key.

The API response provides information about the IP address, including whether it is a proxy or VPN. Based on the response, you can determine if the IP address is likely associated with an automated system.

Please note that proxy and VPN detection is not foolproof, and there is always a chance of false positives or false negatives.

Machine learning model for detection of automation

First, make sure you have the TensorFlow.js library included in your project. You can include it via a <script> tag in your HTML file or install it via npm.

				
					<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.10.0/dist/tf.min.js"></script>

				
					async function loadModel() {
      const model = await tf.loadLayersModel('model/model.json');
      return model;
    }

    async function detectAutomatedBrowserMachineLearning() {
      const model = await loadModel();

      // Extract browser features (e.g., user agent, plugins, screen dimensions, etc.)
      const features = {
        userAgent: navigator.userAgent,
        plugins: Array.from(navigator.plugins, ({ name }) => name),
        screenDimensions: [window.innerWidth, window.innerHeight],
        // Add more features as needed
      };

      // Preprocess the features and convert them to tensor
      const tensor = tf.tidy(() => {
        const inputTensor = tf.tensor(Object.values(features));

        // Normalize the input tensor (e.g., by subtracting mean and dividing by standard deviation)
        const normalizedTensor = tf.div(tf.sub(inputTensor, mean), std);

        // Expand dimensions to match the model's input shape
        const expandedTensor = normalizedTensor.expandDims();

        return expandedTensor;
      });

      // Make predictions on the tensor
      const predictions = await model.predict(tensor).data();

      const isAutomated = predictions[0] > 0.5;
      if (isAutomated) {
        console.log('Automated browser detected.');
      } else {
        console.log('No automated browser detected.');
      }

      tensor.dispose();
      return isAutomated;
    }

The detectAutomatedBrowser function loads the pre-trained model using tf.loadLayersModel and extracts relevant browser features such as user agent, plugins, screen dimensions, etc. You can add or modify features as needed.

The features are then preprocessed and converted into a TensorFlow.js tensor. Preprocessing steps may include normalizing the features by subtracting mean and dividing by standard deviation to ensure consistency with the training data.

The tensor is then passed through the pre-trained model using the model.predict function, and the output predictions are obtained. If the prediction score is above a certain threshold (e.g., 0.5), it considers the browser to be automated.

If you want to train your own model for automated Browser detection these steps will help you:

Data Collection: Gather a diverse and representative dataset that includes both automated browser instances and normal browser instances. This dataset should contain a variety of features that can help distinguish between automated and human-operated browsers.
Feature Extraction: Identify relevant features that can differentiate between automated and human-operated browsers. These features may include user agent strings, plugins, screen resolution, browser behavior patterns, network traffic, JavaScript execution patterns, and more.
Data Preprocessing: Prepare the collected data for model training. This step involves cleaning the data, handling missing values, normalizing numerical features, and encoding categorical features if necessary.
Model Selection: Choose an appropriate machine learning algorithm or model architecture for the automated browser detection task. Depending on the complexity of the problem, you may consider using algorithms such as logistic regression, decision trees, random forests, gradient boosting, or deep learning models such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs).
Model Training: Split the dataset into training and validation sets. Use the training set to train the model on the labeled data, and monitor the model’s performance on the validation set. Adjust hyperparameters and experiment with different model architectures to achieve the desired performance.
Model Evaluation: Evaluate the trained model using appropriate evaluation metrics such as accuracy, precision, recall, and F1-score. This step helps assess the model’s ability to distinguish between automated and human-operated browsers.
Model Deployment: Once you are satisfied with the model’s performance, save the trained model and deploy it in a production environment. This may involve integrating the model into your application or system to perform real-time automated browser detection.

It’s important to note that building an accurate and robust automated browser detection model requires a deep understanding of machine learning, web technologies, and browser automation techniques. Consider seeking assistance from experienced data scientists or machine learning experts to ensure the best possible results.

Anomaly detection of automation

First, make sure you have the ml-kmeans library included in your project. You can include it via a <script> tag in your HTML file or install it via npm.

				
					<script src="https://unpkg.com/ml-kmeans@1.0.0/dist/ml-kmeans.js"></script>

				
					function detectAnomalies() {
      // Generate some example data
      const data = generateData(1000);

      // Perform k-means clustering
      const k = 3; // Number of clusters
      const kmeans = new KMeans();
      const { centroids, assignments } = kmeans.cluster(data, k);

      // Find the cluster with the fewest data points (assumed to be the normal cluster)
      const normalCluster = findNormalCluster(assignments, k);

      // Detect anomalies
      const anomalies = [];
      for (let i = 0; i < data.length; i++) {
        if (assignments[i] !== normalCluster) {
          anomalies.push(data[i]);
        }
      }

      console.log('Anomalies:', anomalies);
      return anomalies;
    }

    function generateData(numPoints) {
      const data = [];
      for (let i = 0; i < numPoints; i++) {
        const x = Math.random() * 100;
        const y = Math.random() * 100;
        data.push([x, y]);
      }
      return data;
    }

    function findNormalCluster(assignments, k) {
      const counts = Array(k).fill(0);
      for (let i = 0; i < assignments.length; i++) {
        counts[assignments[i]]++;
      }
      const normalCluster = counts.indexOf(Math.min(...counts));
      return normalCluster;
    }

The detectAnomalies function generates some example data using the generateData function. You can modify this function to generate data according to your specific use case.

Next, it performs k-means clustering using the KMeans class from the ml-kmeans library. The number of clusters (k) is set to 3 in this example, but you can adjust it as needed. The cluster method returns the centroids (cluster centers) and the assignments (cluster labels) for each data point.

After clustering, the code finds the cluster with the fewest data points, which is assumed to be the normal cluster. The findNormalCluster function takes the assignments and the number of clusters (k) as input and returns the index of the normal cluster.

Finally, it iterates through the data points and checks if each point belongs to the normal cluster or not. Points that belong to other clusters are considered anomalies and are added to the anomalies array.

The anomalies are logged to the console.

Please note that this is a basic example of anomaly detection using k-means clustering, and the effectiveness of the algorithm may vary depending on the nature of your data. For more complex anomaly detection tasks, you might need to explore other algorithms or techniques tailored to your specific use case.

A more complex case would be using an algorithm called Isolation Forest. Isolation Forest is an unsupervised machine learning algorithm that can efficiently identify anomalies in high-dimensional data.

To demonstrate Isolation Forest, we’ll use the isolation-forest library, which provides an implementation of the algorithm in JavaScript. Make sure you have the library included in your project either via a <script> tag or by installing it via npm.

				
					<script src="https://unpkg.com/isolation-forest@1.2.1/dist/isolation-forest.min.js"></script>

				
					async function detectAnomalies() {
      // Generate example data
      const data = generateData(1000);

      // Perform anomaly detection using Isolation Forest
      const forest = new IsolationForest();
      await forest.fit(data);
      const anomalies = forest.predict(data);

      console.log('Anomalies:', anomalies);
      return anomalies;
    }

    function generateData(numPoints) {
      const data = [];
      for (let i = 0; i < numPoints; i++) {
        const x = Math.random() * 100;
        const y = Math.random() * 100;
        data.push([x, y]);
      }
      return data;
    }

The detectAnomalies function generates some example data using the generateData function. You can modify this function to generate data according to your specific use case.

Next, it creates an instance of the IsolationForest class from the isolation-forest library. The fit method is called to train the Isolation Forest model on the generated data. The predict method is then used to detect anomalies in the data, which returns an array indicating the anomaly score for each data point.

The anomalies are logged to the console.

Note that automated browser detection is a cat-and-mouse game, and determined adversaries may find ways to evade detection. Regular updates and improvements to the detection mechanisms are essential to stay ahead of emerging threats.