Integrating Machines Using Android and Python - Part 3
Overcoming Dynamic Screen Challenges with ADB and Appium
When working with consumer-level machines integrated into larger systems, I faced a significant challenge: dynamic screens. These screens, which change frequently based on user interactions, made it impossible to control the machine using just ADB. This is where Appium came in handy.
Setting Up Appium for Development
Initially, I set up Appium Desktop to explore the possibilities. This graphical interface allowed me to interact with the device and see the UI hierarchy in real-time, which was crucial for understanding how the app’s screens were structured. After gaining enough insight, I transitioned to using the Appium CLI version through NPM for more streamlined and automated control.
Crafting the ADB Hierarchy Handler
With the knowledge gained, I wrote a custom class to handle interactions with the device. This class, ADB_Hierarchy_Handler
, encapsulates the logic for connecting to the device, parsing the UI hierarchy, and performing actions based on dynamic screen content. While I can’t share the full implementation, here’s an overview of its functionality:
- Initialization and Connection: The class starts the ADB server, connects to the device, and initializes Appium. If no device is found, it raises a custom error and exits.
- UI Hierarchy Parsing: Using
xmltodict
and regular expressions, the class parses the XML representation of the UI hierarchy to locate elements dynamically. - Action Execution: Based on the parsed data, it performs actions such as clicking buttons or entering text.
Here’s a description of the critical sections of the class:
import time
import os
import subprocess
import re
from ppadb.client import Client
from appium import webdriver
from selenium.common import exceptions as SE
import xmltodict
class CustomError(Exception):
pass
class ADB_Hierarchy_Handler:
print(os.environ['ANDROID_HOME'])
os.system('adb start-server')
adb = Client()
devices = adb.devices()
if len(devices) == 0:
print('no device attached')
raise CustomError("no device attached to ADB server")
appium_server = subprocess.Popen('appium', shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
caps = {
"platformName": "Android",
"automationName": "UiAutomator2",
"skipServerInstallation": True,
"disableWindowAnimation": True
}
driver = webdriver.Remote("http://localhost:4723/wd/hub", caps)
def parse_ui_hierarchy(self):
xml_data = self.driver.page_source
data_dict = xmltodict.parse(xml_data)
return data_dict
def find_element(self, search_dict, field):
elements = get_recursively(search_dict, field)
return elements
def click_element(self, element):
bounds = get_bounds_as_dict(element['@bounds'])
center = get_center(bounds)
self.driver.tap([(center['x'], center['y'])])
Utilizing Helper Functions for Parsing
I leveraged several helper functions to streamline the process of parsing the UI hierarchy and extracting necessary information:
import re
bounds_regex = re.compile("\\[(\\d+),(\\d+)\\]\\[(\\d+),(\\d+)\\]")
def get_recursively(search_dict, field):
fields_found = []
for key, value in search_dict.items():
if key == field:
fields_found.append(value)
elif isinstance(value, dict):
results = get_recursively(value, field)
fields_found.extend(results)
elif isinstance(value, (list, tuple)):
for item in value:
if isinstance(item, dict):
more_results = get_recursively(item, field)
fields_found.extend(more_results)
return fields_found
def get_bounds_as_dict(bounds_str):
bounds = {}
matches = bounds_regex.search(bounds_str)
bounds['left'] = int(matches.group(1))
bounds['top'] = int(matches.group(2))
bounds['right'] = int(matches.group(3))
bounds['bottom'] = int(matches.group(4))
return bounds
def get_center(bounds_dict):
coordinates = {}
coordinates['x'] = (bounds_dict['right'] - bounds_dict['left']) // 2 + bounds_dict['left']
coordinates['y'] = (bounds_dict['bottom'] - bounds_dict['top']) // 2 + bounds_dict['top']
return coordinates
Wrapping Up and Testing
The class includes a main test case to ensure everything works as expected:
if __name__ == "__main__":
try:
handler = ADB_Hierarchy_Handler()
ui_data = handler.parse_ui_hierarchy()
elements = handler.find_element(ui_data, 'desired_element_id')
if elements:
handler.click_element(elements[0])
except CustomError as e:
print(f"An error occurred: {e}")
This code initializes the handler, parses the UI hierarchy, finds the desired element by its ID, and clicks it. This approach enabled me to automate interactions with the device efficiently, overcoming the challenges posed by dynamic screens without relying on OpenCV.
Conclusion
Through the use of Appium and careful parsing of the UI hierarchy, I successfully automated the control of machines with dynamic screens. This solution proved robust and adaptable, paving the way for more efficient integration of consumer-level machines into larger systems.
I also believe that using Appium and a virtual device runnning on the PC controlling the robot could have been a better solution for the first machine, instead of the unused cell phone we used to get that working. I should have tried to get it going, but I didn’t have the time to revive the old machine and tinker with it. I’ll definitely consider it for future projects.
This post have been sitting in my drafts for a while now. I’ve been busy with other projects, but I’m glad I finally got around to finishing it. I hope you find it useful!. It is also a continuation of the previous posts, so if you haven’t read them yet, I suggest you do so before continuing: 1st post, 2nd post.