Overcoming Dynamic Screen Challenges with ADB and Appium

When working with consumer-level machines integrated into larger systems, I faced a significant challenge: dynamic screens. These screens, which change frequently based on user interactions, made it impossible to control the machine using just ADB. This is where Appium came in handy.

Setting Up Appium for Development

Initially, I set up Appium Desktop to explore the possibilities. This graphical interface allowed me to interact with the device and see the UI hierarchy in real-time, which was crucial for understanding how the app’s screens were structured. After gaining enough insight, I transitioned to using the Appium CLI version through NPM for more streamlined and automated control.

Crafting the ADB Hierarchy Handler

With the knowledge gained, I wrote a custom class to handle interactions with the device. This class, ADB_Hierarchy_Handler, encapsulates the logic for connecting to the device, parsing the UI hierarchy, and performing actions based on dynamic screen content. While I can’t share the full implementation, here’s an overview of its functionality:

  1. Initialization and Connection: The class starts the ADB server, connects to the device, and initializes Appium. If no device is found, it raises a custom error and exits.
  2. UI Hierarchy Parsing: Using xmltodict and regular expressions, the class parses the XML representation of the UI hierarchy to locate elements dynamically.
  3. Action Execution: Based on the parsed data, it performs actions such as clicking buttons or entering text.

Here’s a description of the critical sections of the class:

import time
import os
import subprocess
import re
from ppadb.client import Client
from appium import webdriver
from selenium.common import exceptions as SE
import xmltodict

class CustomError(Exception):
    pass

class ADB_Hierarchy_Handler:
    print(os.environ['ANDROID_HOME'])
    os.system('adb start-server')
    adb = Client()
    devices = adb.devices()

    if len(devices) == 0:
        print('no device attached')
        raise CustomError("no device attached to ADB server")

    appium_server = subprocess.Popen('appium', shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

    caps = {
        "platformName": "Android",
        "automationName": "UiAutomator2",
        "skipServerInstallation": True,
        "disableWindowAnimation": True
    }

    driver = webdriver.Remote("http://localhost:4723/wd/hub", caps)

    def parse_ui_hierarchy(self):
        xml_data = self.driver.page_source
        data_dict = xmltodict.parse(xml_data)
        return data_dict

    def find_element(self, search_dict, field):
        elements = get_recursively(search_dict, field)
        return elements

    def click_element(self, element):
        bounds = get_bounds_as_dict(element['@bounds'])
        center = get_center(bounds)
        self.driver.tap([(center['x'], center['y'])])

Utilizing Helper Functions for Parsing

I leveraged several helper functions to streamline the process of parsing the UI hierarchy and extracting necessary information:

import re

bounds_regex = re.compile("\\[(\\d+),(\\d+)\\]\\[(\\d+),(\\d+)\\]")

def get_recursively(search_dict, field):
    fields_found = []
    for key, value in search_dict.items():
        if key == field:
            fields_found.append(value)
        elif isinstance(value, dict):
            results = get_recursively(value, field)
            fields_found.extend(results)
        elif isinstance(value, (list, tuple)):
            for item in value:
                if isinstance(item, dict):
                    more_results = get_recursively(item, field)
                    fields_found.extend(more_results)
    return fields_found

def get_bounds_as_dict(bounds_str):
    bounds = {}
    matches = bounds_regex.search(bounds_str)
    bounds['left'] = int(matches.group(1))
    bounds['top'] = int(matches.group(2))
    bounds['right'] = int(matches.group(3))
    bounds['bottom'] = int(matches.group(4))
    return bounds

def get_center(bounds_dict):
    coordinates = {}
    coordinates['x'] = (bounds_dict['right'] - bounds_dict['left']) // 2 + bounds_dict['left']
    coordinates['y'] = (bounds_dict['bottom'] - bounds_dict['top']) // 2 + bounds_dict['top']
    return coordinates

Wrapping Up and Testing

The class includes a main test case to ensure everything works as expected:

if __name__ == "__main__":
    try:
        handler = ADB_Hierarchy_Handler()
        ui_data = handler.parse_ui_hierarchy()
        elements = handler.find_element(ui_data, 'desired_element_id')
        if elements:
            handler.click_element(elements[0])
    except CustomError as e:
        print(f"An error occurred: {e}")

This code initializes the handler, parses the UI hierarchy, finds the desired element by its ID, and clicks it. This approach enabled me to automate interactions with the device efficiently, overcoming the challenges posed by dynamic screens without relying on OpenCV.

Conclusion

Through the use of Appium and careful parsing of the UI hierarchy, I successfully automated the control of machines with dynamic screens. This solution proved robust and adaptable, paving the way for more efficient integration of consumer-level machines into larger systems.

I also believe that using Appium and a virtual device runnning on the PC controlling the robot could have been a better solution for the first machine, instead of the unused cell phone we used to get that working. I should have tried to get it going, but I didn’t have the time to revive the old machine and tinker with it. I’ll definitely consider it for future projects.