PyAutoGUI を用いて Google Colab Pro+ でなくても放置状態で計算させ続ける

Colab ユーザーにおいて，リッチな Pro+ ユーザでない限り最も厄介なのは，「バックグラウンド実行できない」「放置しておくと操作中かどうか聞かれるウィンドウが表示され，そのまま更に放置するとセッションが切れる」という点だと思います．
上記問題に対し，PyAutoGUI を用いた自動的な GUI 操作によって，常に画面を監視して「私はロボットではありません」に勝手にチェックを入れ続ける処理を実装してみましたので，当該記事にて紹介します．

PyAutoGUI は，その名の通り Python のスクリプトから GUI を自動で操作できるライブラリです．
PyAutoGUI については，こちらの記事にて説明しています．

その他 Colab 利用における問題として，ストレージの大きさが Google Drive に依存することが挙げられますが，kaggle 上，または Google Cloud Strage 上のデータセットであれば，次の記事にて解決方法を提案しています．

背景・・・・Pro+ じゃなくても放置で計算させたい
動作確認済み環境・・・・Windows 10 / Python 3.10.4
コード実装・・・・「私はロボットではありません」を自動でクリック
使い方・・・・Colab 画面を表示して py running.py -u {url}
【補足】コードを抜粋して説明
Python を用いた自動化についてのオススメの書籍

背景・・・・Pro+ じゃなくても放置で計算させたい

Google Colab は，Google のアカウントさえ持っていれば，Google のクラウドサーバー上にてプログラミングとその実行が可能で，さらには GPU / TPU を用いた大規模並列計算を行うことができる，大変素晴らしいサービスです．
この利用プランとしては，次の画像のように，無料の Colab ／月額 \1,072 の Colab Pro ／月額 \5,243 の Colab Pro+ が用意されています．

無料でも Colab は使うことができるのですが，GPU / TPU を連続的に使用すると，さもスマホゲームのスタミナ切れのように GPU / TPU 利用が制限され，Colab Pro / Pro+ への移行を促されます．
それなりの GPU を積んだ PC を持っていないような Colab をぼちぼち使うユーザーは，上記特性と料金体系から Colab Pro を選択することが多いと予想され，Colab Pro が一番のボリュームゾーンと思われます．
（GPU は消費電力も大きいので，電気代やエコの観点からも，大きな企業がこういったサービスで一括でまとめているほうが，最適化されている気がします．Colab とか大規模演算サーバーが宇宙空間上にあったら，もっとエコなんじゃないかと妄想してみたり．）

Pro+ でなくても Pro であれば，GPU / TPU をほぼ連続的に利用できるので嬉しいのですが，やはり，「バックグラウンド実行できない」「放置し続けるとセッションが切れる」という点が大きなマイナスポイントと言えるでしょう．
ちなみに，Colab / Colab Pro にて計算を回したり，そのまま操作せずに放置し続けると，下図のように，「まだ操作中ですか？ Colab は，インタラクティブな使用を目的としています．現在も使用中であることを確認してください．」というメッセージが表示されます．
継続利用のためには reCAPTCHA の「私はロボットではありません」にチェックを入れる必要があります．

スクリプト実行時にこのブログのこの画像に吸い寄せられないようにするため，実際に表示されるものと比べてサイズが大きかったり，わざと画素を荒くしたり，「見本」と入れています

Deep Learning において，精度が定常状態に安定するまで数時間以上かかることはザラですが，そんな数時間も生暖かく見守ってやれないですし，時間ある時に走らせて経過をスマホから確認したりしたいという経緯から，Colab の PyAutoGUI を用いてランタイムを保持するコードを作成しました．

動作確認済み環境・・・・Windows 10 / Python 3.10.4

動作確認済み環境は以下です．

Windows 10 Home 21H2 (OS ビルド 19044.1766)
Python 3.10.4
- PyAutoGUI: 0.9.53
- pytz: 2022.1

コード実装・・・・「私はロボットではありません」を自動でクリック

結論から言うとコードの実装は，画面上に表示された「私はロボットではありません」を自動でクリックするという処理にいくつか機能を付加したものです．
注意として，「私はロボットではありません」のウィンドウのクリックで詐欺を狙うようなものについても，これに吸い寄せられて勝手に押してしまいます．
Colab 以外のそういったサイトをスクリプト実行中に開いたり，または，怪しいソフトやライブラリをインストールしないようにしましょう．

機能としては以下です．

「私はロボットではありません」のチェックボックスを自動でクリックする／しない
Colab を開いているブラウザを指定
何回画面をチェックするか
何秒おきに画面をチェックするか
GUI 自動操作における，カーソル移動などの各動作の時間の指定
処理終了後にシャットダウンする／しない

今回のコードはこちらの github レポジトリに載せています．

GitHub - KazutoMakino/colabutils: colabutils

colabutils. Contribute to KazutoMakino/colabutils development by creating an account on GitHub.

まずはコード全体 (何でも良いのですが running.py という名前で保存しています) ：

"""Keep running colab runtime.

Usage:
- `py running.py -u {active google colab's url}`
- `py running.py -u {active google colab's url} -s True`

---

KazutoMakino

"""

import os
import random
import sys
import time
import traceback
import webbrowser
from argparse import ArgumentParser
from datetime import datetime, timedelta
from pathlib import Path
from pprint import pprint

import pyautogui as pg
import pytz

#######################################################################################
# main
#######################################################################################


def main():
    AutoColabRunner.run()


#######################################################################################
# class
#######################################################################################


class AutoColabRunner:
    @staticmethod
    def run() -> None:
        # get args
        args = ArgsGetter.get_args()

        # reload web page
        WebBrowser.is_reloaded(
            app_name=args.app,
            cycles=args.cycles,
            sleep_time=args.time,
            url=args.url,
            gui_auto=args.gui,
            check_time_interval=args.interval,
            is_shutdown=args.shutdown,
        )

        # show message box
        pg.alert(text="complete", title="end", timeout=60 * 60)


class ArgsGetter:
    @staticmethod
    def get_args() -> ArgumentParser.parse_args:
        """Get command line arguments.

        Returns:
            ArgumentParser.parse_args: Command line arguments.
        """
        # get parameters from command line
        parser = ArgumentParser(description="Recursively reloading the web page.")
        parser.add_argument(
            "-a",
            "--app",
            type=str,
            default="chrome",
            help="application name (chrome, edge, firefox, safari)",
        )
        parser.add_argument(
            "-c",
            "--cycles",
            type=int,
            default=12 * 2,
            help="number of recursion",
        )
        parser.add_argument(
            "-t",
            "--time",
            type=float,
            default=60 * 30,
            help="sleep time [s] per cycle",
        )
        parser.add_argument(
            "-u",
            "--url",
            type=str,
            default="https://www.google.co.jp/",
            help="URL of web page",
        )
        parser.add_argument(
            "-g",
            "--gui",
            type=bool,
            default=True,
            help="GUI Automation",
        )
        parser.add_argument(
            "-i",
            "--interval",
            type=float,
            default=5,
            help="time interval of GUI Automation",
        )
        parser.add_argument(
            "-s",
            "--shutdown",
            type=bool,
            default=False,
            help="shutdown or not of the end",
        )

        # get args
        args = parser.parse_args()
        return args


class WebBrowser:
    # get application path (windows: nt, mac/linux: posix)
    app_path = {
        "nt": {
            "chrome": "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
            "edge": "",
            "firefox": "",
            "safari": "",
        },
        "posix": {
            "chrome": "",
            "edge": "",
            "firefox": "",
            "safari": "",
        },
    }

    @classmethod
    def is_reloaded(
        cls,
        app_name: str = "chrome",
        cycles: int = 12,
        sleep_time: float = 1,
        url: str = "https://www.google.co.jp/",
        gui_auto: bool = True,
        check_time_interval: float = 5,
        is_shutdown: bool = False,
    ) -> None:
        # show parameters
        pprint(
            {
                "app_name": app_name,
                "cycles": cycles,
                "sleep_time": sleep_time,
                "gui_auto": gui_auto,
                "url": url,
                "check_time_interval": check_time_interval,
                "is_shutdown": is_shutdown,
            }
        )

        # recursive reload web page per sleep_time
        time_now = datetime.now(tz=pytz.timezone("Asia/Tokyo"))
        deadline = time_now + timedelta(seconds=cycles * sleep_time)
        print(f"now: {time_now}")
        print(f"-> end this routine: {deadline}")
        print("--------------------------------------------------")
        for i in range(cycles):
            # get web browser application
            browser_app = cls.app_path[os.name][app_name]
            browse = webbrowser.get(f"{browser_app} %s")

            # reload url
            browse.open(url=url)
            print(
                f"iteration: {i+1}/{cycles}, "
                + f'now: {datetime.now(tz=pytz.timezone("Asia/Tokyo"))}'
            )

            if gui_auto:
                # init
                elapsed_time = 0

                # loop: elapsed_time < sleep_time
                while elapsed_time < sleep_time:
                    # get start time
                    start_time = time.perf_counter()

                    # take a nup
                    time.sleep(check_time_interval)

                    # set image path
                    img_path = (
                        Path(__file__).parent.resolve() / "img/are_you_a_robot.jpg"
                    )

                    # check and GUI operation
                    xy = GUIHandler.get_matched_figure_area(
                        img_path=img_path, try_count=1
                    )
                    if xy:
                        time.sleep(random.random())
                        pg.moveTo(
                            x=xy["left"] + 152,
                            y=xy["top"] + 174,
                            duration=3 * random.random(),
                        )
                        time.sleep(random.random())
                        pg.leftClick()
                        time.sleep(random.random())

                    # add elapsed time to elapsed_time
                    elapsed_time += time.perf_counter() - start_time

            else:
                # sleep
                time.sleep(sleep_time)

        # shutdown
        if is_shutdown:
            print("This system will be shutdown after 60 [s]")
            time.sleep(60)
            os.system("shutdown -s")


class GUIHandler:
    """Handlers of GUI operation."""

    @staticmethod
    def gui_fail_safe(pause_time: float = 1, failsafe: bool = True) -> None:
        """Fail-safe settings of auto GUI operation.

        Args:
            pause_time (float, optional):
                A pause time interval @ pyautogui's mouse/key operations.
                Defaults to 1.
            failsafe (bool, optional):
                A definition of pyautogui fail safe mode setting.
                Defaults to True.
        """
        # set fail safe
        pg.FAILSAFE = failsafe
        # set pause time @ moving the mouse
        if failsafe:
            pg.PAUSE = pause_time

    @staticmethod
    def get_matched_figure_area(
        img_path: Path,
        confidence: float = 0.8,
        ret: bool = True,
        try_count: int = 10,
        interval: float = 1,
    ) -> dict:
        """Get the matched figure area.

        Args:
            img_path (Path): A file path.
            confidence (float, optional):
                A degree of confidence of image recognition.
                Defaults to 0.9.
            ret (bool, optional): Return or not. Defaults to True.
            try_count (int, optional): Counter of trying this task.
                Defaults to 10.
            interval (float, optional): Waiting time of each trial.
                Defaults to 1.

        Returns:
            dict: The matched area parameters at dict type.
        """
        # loop: get figure position
        # print("Searching:", img_path)
        for i in range(try_count):
            # print("    count: {0} / {1}".format(i, try_count))
            try:
                # get figure position
                screen_fig = pg.locateOnScreen(str(img_path), confidence=confidence)
                if screen_fig:
                    break
            except pg.ImageNotFoundException:
                time.sleep(interval)
        if screen_fig is None:
            # AttributeError(f"not found: {img_path.resolve()}")
            # sys.exit()
            return None
        elif not ret:
            return
        # get position / size
        left, top, right, bottom = screen_fig
        width, height = right - left, bottom - top
        center_x, center_y = int(round(right - left)), int(round(bottom - top))
        if ret:
            return {
                "left": left,
                "top": top,
                "right": right,
                "bottom": bottom,
                "width": width,
                "height": height,
                "center_x": center_x,
                "center_y": center_y,
            }


#######################################################################################

if __name__ == "__main__":
    try:
        main()
    except Exception:
        traceback.print_exc()
    sys.exit()

使い方・・・・Colab 画面を表示して py running.py -u {url}

当該コードには色々機能を持たせていますが，python running.py 実行時に，以下のコマンドライン引数によって機能の on / off などの制御が可能です．

-u または –url (デフォルト： “https://www.google.co.jp/”)・・・・自動更新する url を指定（ここでは，計算中の Colab の web ページのアドレスをコピペ）
-s または –shutdown （デフォルト： False）・・・・処理終了後に PC をシャットダウンする／しない
-a または –app （デフォルト： “chrome”）・・・・計算中の Colab が表示されている web ブラウザを指定
-c または –cycles （デフォルト： 12 * 2）・・・・-u にて指定した計算中の Colab の web ページ自動更新する回数
-t または –time （デフォルト： 60 * 30）・・・・-u にて指定した計算中の Colab の web ページ自動更新する時間の間隔 [s]
-g または –gui （デフォルト： True）・・・・GUI 操作を自動化するかどうかで，基本的にデフォルトの True のまま
-i または –interval （デフォルト： 5）・・・・「私はロボットではありません」のウィンドウがあるかどうか画像のパターンマッチングを行う時間の間隔

使用にあたっての注意として，

ソースコード 200 行目の "img/are_you_a_robot.jpg" については，お使いの PC 環境によって画像が異なることが予想されるので，ご自身にて「私はロボットではありません」のウィンドウのスクショを準備したり，適宜パスを修正したりしてください
ソースコード 210, 211 行目は，モニタ上の「私はロボットではありません」のチェックボックスの座標を指定していますが，これは，上記にて設定した画像と，パターンマッチングによって得られたモニタ内の同じ画像の左上の座標からの増分値（単位：ピクセル）で表しています
この増分値は，例えば windows であれば下図のようにペイントなどのお絵かきソフトで確認し，数値を修正すると良いでしょう

reCAPTCHA-xy — reCAPTCHA のチェックボックスの座標の取り方の例

「私はロボットではありません」のウィンドウを探す動作は PyAutoGUI の locateOnScreen (中身は opencv の matchTemplate) を用いており，仕様上，メインディスプレイのみで動作します
何らかの原因でネットワークが切れることによる Colab 停止に対する処理は，ここでは行っていません（やるとすれば，wi-fi や飛行機モード on/off の自動化も追加ください）
完全に放置させる場合は，スクリーンセーバー／モニタ／スリープなどパソコン側の設定を変更し，スリープなどしないようにしましょう

【補足】コードを抜粋して説明

先に示したコードについて，抜粋して説明していきます．

class AutoColabRunner

AutoColabRunner.run() で以下に続くクラスやメソッドを呼び出して，今回のコードの全ての処理をここでまとめています．
処理が正常に終了した場合は，complete とポップアップウィンドウが出るようにしています．

class ArgsGetter

class ArgsGetter:
    @staticmethod
    def get_args() -> ArgumentParser.parse_args:
～中略～
        # get args
        args = parser.parse_args()
        return args

class ArgsGetter の get_args() にて，コマンドライン引数を取得しています．
この引数については前の章の使い方に記載しているため省略します．

class WebBrowser

WebBrowser.is_reloaded(**) にて，web ページのリロード／reCAPTCHA 自動クリック／正常終了時のシャットダウンを行います．

class WebBrowser:
    # get application path (windows: nt, mac/linux: posix)
    app_path = {
        "nt": {
            "chrome": "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
            "edge": "",
            "firefox": "",
            "safari": "",
        },
        "posix": {
            "chrome": "",
            "edge": "",
            "firefox": "",
            "safari": "",
        },
    }

class WebBrowser のクラス変数 app_path はどのブラウザを用いるかを指定してますが，windows で chrome を使う場合しか記載していないので，これ以外の組み合わせの場合は適宜修正してください．

class GUIHandler

def gui_fail_safe(**) にて，pyautogui.failesafe と pyautogui.PAUSE を有効にしています．

    @staticmethod
    def get_matched_figure_area(
        img_path: Path,
        confidence: float = 0.8,
        ret: bool = True,
        try_count: int = 10,
        interval: float = 1,
    ) -> dict:

GUIHandler.get_matched_figure_area(**) にて，パターンマッチング処理を行った結果である，モニタ上の画像の座標値を辞書型で返します．
これの引数としては，

img_path：どの画像とパターンマッチングを行うかについて，画像ファイルのパスを渡す
confidence：パターンマッチング処理にて，「画像とどこまで同じか」という割合を指定（1.0 に近いと失敗しやすいので，0.8 くらいが経験的に丁度良いです）
ret：戻り値あり／なし
try_count：パターンマッチング処理する回数
interval：パターンマッチング処理を複数回行う場合の時間間隔 [s]

です．
処理が重くて，パターンマッチングする対象の画像がウィンドウに表示されるまでに時間がかかるようであれば， try_count / interval を大きな値にすると良いでしょう．

Python を用いた自動化についてのオススメの書籍

以下書籍は，「退屈なことは Python にやらせよう」というタイトルの通り，日常の事務作業におけるデータシート作成，図形描画，メール操作，あるいはGUI操作など決まった処理をPythonを用いて自動化させようという書籍です．
自動化処理の実装だけでなくPythonの基礎や各ライブラリの説明など詳しく書かれているため，「Pythonで何かの処理を自動化させたい」という方に特にオススメです．
※　「ノンプログラマー」というフレーズがありますが，もちろんプログラミングはします．

リンク