AWS Batch Example Project¶
Keywords: AWS Batch Example Project
Summary¶
本文我在了解了 AWS Batch 的基本概念和功能后, 做的第一个实验性质的项目, 同时也为我今后做 AWS Batch Project 提供了参考. 在这个项目中我们刻意让业务逻辑极简化但是又有一定的业务代表性.
在这个项目中我们创建了一个 Container App, 只要给定一个 Source S3 folder 和一个 Target S3 folder 作为参数, 就能将在 Source 下的所有文件拷贝到 Target 上.
首先我们来计划一下我们需要做什么:
创建一个 ECR Repo, 然后把 App 的代码打包成容器.
创建一个 Computer Environment
创建一个 Job Queue
创建一个 Job Definition, 其中指定使用我们的容器
用创建的 Job Definition 提交一个 Job 到 Job Queue 中, 然后 Queue 会自动寻找可用的 Compute Environment 来运行这个 Job.
Reference:
Components of AWS Batch: https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html#batch_components
Prepare Container Image¶
首先我们要准备好我们的业务代码和容器镜像.
App Code
我们这个 App 非常简单. 它是用 Python 实现的, requirements.txt
定义了用到的依赖:
fire==0.4.0
pathlib_mate>=1.2.1,<3.0.0
s3pathlib>=2.0.1,<3.0.0
boto_session_manager>=1.5.3,<2.0.0
App 的源代码 main.py
文件:
1# -*- coding: utf-8 -*-
2
3from boto_session_manager import BotoSesManager
4from s3pathlib import S3Path, context
5
6
7def copy_s3_folder(
8 bsm: BotoSesManager,
9 s3dir_source: S3Path,
10 s3dir_target: S3Path,
11):
12 """
13 Core logic.
14 """
15 context.attach_boto_session(bsm.boto_ses)
16 print(f"copy files from {s3dir_source.uri} to {s3dir_target.uri}")
17 for s3path_source in s3dir_source.iter_objects():
18 relpath = s3path_source.relative_to(s3dir_source)
19 s3path_target = s3dir_target.joinpath(relpath)
20 print(f"copy: {relpath.key}")
21 s3path_source.copy_to(s3path_target, overwrite=True)
22
23
24def main(
25 region: str,
26 s3uri_source: str,
27 s3uri_target: str,
28):
29 """
30 wrapper around the core logic, expose the parameter to CLI.
31 """
32 print(f"received: region = {region!r}, s3uri_source = {s3uri_source!r}, s3uri_target = {s3uri_target!r}")
33 copy_s3_folder(
34 bsm=BotoSesManager(region_name=region),
35 s3dir_source=S3Path(s3uri_source).to_dir(),
36 s3dir_target=S3Path(s3uri_target).to_dir(),
37 )
38
39
40# convert the app to a CLI app.
41if __name__ == "__main__":
42 import fire
43
44 fire.Fire(main)
Dockerfile
的内容, 我们用的 base image 是 Python:
# this is public and open source
FROM public.ecr.aws/docker/library/python:3.9-alpine
# set working directory
WORKDIR /usr/src/app
# package application
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY main.py ./
ENTRYPOINT ["python", "./main.py"]
如果你想要在构建容器镜像之前本地运行一下 App, 你可以:
# CD to where the main.py is
# create a virtualenv at .venv folder
virtualenv -p python3.9 .venv
# activate virtualenv
source .venv/bin/activate
# install dependencies
pip install -r requirements.txt
# try to run the CLI
python main.py --region ${aws_region} --s3uri_source s3://${aws_account_id}-${aws_region}-data/projects/lambda_project1/sbx/source/ --s3uri_target s3://${aws_account_id}-${aws_region}-data/projects/lambda_project1/sbx/final/
Create ECR Repository
点击 Create Repository
Visibility settings 选: Private
Repository name 填:
aws_batch_example
Tag immutability 选: disabled, 这样我们可以不断的覆盖特定的 Tag
其他选默认
Build and Publish Container Image
CD 到 Dockerfile 所在的目录.
参考
./ecr_login
脚本的内容, 运行它用 Docker 对 AWS ECR 进行登录. 记得先用chmod +x ecr_login
命令将其变为可执行文件. 脚本内容如下:
1#!/usr/bin/env python
2# -*- coding: utf-8 -*-
3
4"""
5This shell script automates docker login to AWS ECR.
6
7Requirements:
8
9- Python3.7+
10- `fire>=0.1.3,<1.0.0 <https://pypi.org/project/fire/>`_
11- make sure you have run ``chmod +x ecr_login`` to make this script executable
12
13Usage:
14
15.. code-block:: bash
16
17 # show help info
18 $ ./ecr_login -h
19
20 # on local laptop use AWS cli profile
21 $ ./ecr_login --aws-profile ${your_aws_profile}
22
23 # on EC2, Cloud9, CloudShell
24 $ ./ecr_login --aws-region ${your_aws_region}
25
26 # if your boto session doesn't have sts:GetCallerIdentity permission
27 # you have to explicitly provide AWS account ID
28 $ ./ecr_login --aws-region ${your_aws_region} --aws-account-id ${your_aws_account_id}
29"""
30
31import typing as T
32import boto3
33import base64
34import subprocess
35
36import fire
37
38
39def get_ecr_auth_token_v1(
40 ecr_client,
41 aws_account_id,
42) -> str:
43 """
44 Get ECR auth token using boto3 SDK.
45 """
46 res = ecr_client.get_authorization_token(
47 registryIds=[
48 aws_account_id,
49 ],
50 )
51 b64_token = res["authorizationData"][0]["authorizationToken"]
52 user_pass = base64.b64decode(b64_token.encode("utf-8")).decode("utf-8")
53 auth_token = user_pass.split(":")[1]
54 return auth_token
55
56
57def get_ecr_auth_token_v2(
58 aws_region: str,
59 aws_profile: T.Optional[str] = None,
60):
61 """
62 Get ECR auth token using AWS CLI.
63 """
64 args = ["aws", "ecr", "get-login", "--region", aws_region, "--no-include-email"]
65 if aws_profile is not None:
66 args.extend(["--profile", aws_profile])
67 response = subprocess.run(args, check=True, capture_output=True)
68 text = response.stdout.decode("utf-8")
69 auth_token = text.split(" ")[5]
70 return auth_token
71
72
73def docker_login(
74 auth_token: str,
75 registry_url: str,
76) -> bool:
77 """
78 Login docker cli to AWS ECR.
79
80 :return: a boolean flag to indicate if the login is successful.
81 """
82 pipe = subprocess.Popen(["echo", auth_token], stdout=subprocess.PIPE)
83 response = subprocess.run(
84 ["docker", "login", "-u", "AWS", registry_url, "--password-stdin"],
85 stdin=pipe.stdout,
86 capture_output=True,
87 )
88 text = response.stdout.decode("utf-8")
89 return "Login Succeeded" in text
90
91
92def main(
93 aws_profile: T.Optional[str] = None,
94 aws_account_id: T.Optional[str] = None,
95 aws_region: T.Optional[str] = None,
96):
97 """
98 Login docker cli to AWS ECR using boto3 SDK and AWS CLI.
99
100 :param aws_profile: specify the AWS profile you want to use to login.
101 usually this parameter is used on local laptop that having awscli
102 installed and configured.
103 :param aws_account_id: explicitly specify the AWS account id. if it is not
104 given, it will use the sts.get_caller_identity() to get the account id.
105 you can use this to get the auth token for cross account access.
106 :param aws_region: explicitly specify the AWS region for boto3 session
107 and ecr repo. usually you need to set this on EC2, ECS, Cloud9,
108 CloudShell, Lambda, etc ...
109 """
110 boto_ses = boto3.session.Session(
111 region_name=aws_region,
112 profile_name=aws_profile,
113 )
114 ecr_client = boto_ses.client("ecr")
115 if aws_account_id is None:
116 sts_client = boto_ses.client("sts")
117 res = sts_client.get_caller_identity()
118 aws_account_id = res["Account"]
119
120 print("get ecr auth token ...")
121 auth_token = get_ecr_auth_token_v1(
122 ecr_client=ecr_client,
123 aws_account_id=aws_account_id,
124 )
125 if aws_region is None:
126 aws_region = boto_ses.region_name
127 print("docker login ...")
128 flag = docker_login(
129 auth_token=auth_token,
130 registry_url=f"https://{aws_account_id}.dkr.ecr.{aws_region}.amazonaws.com",
131 )
132 if flag:
133 print("login succeeded!")
134 else:
135 print("login failed!")
136
137
138def run():
139 fire.Fire(main)
140
141
142if __name__ == "__main__":
143 run()
依次运行
./cli build-image
,./cli test-image
,./cli push-image
, 分别用来 构建, 测试, 发布. 记得先用chmod +x cli
命令将其变为可执行文件. 脚本内容如下:
1#!/usr/bin/env python
2# -*- coding: utf-8 -*-
3
4"""
5This shell script can:
6
7- build container image for AWS Batch
8- push container image to AWS ECR
9- test image locally
10
11Requirements:
12
13- update the "Your project configuration here" part at beginning of this script
14- Python3.7+
15- `fire>=0.1.3,<1.0.0 <https://pypi.org/project/fire/>`_
16- `s3pathlib>=2.0.1,<3.0.0 <https://pypi.org/project/s3pathlib/>`_
17- `boto_session_manager>=1.5.3,<2.0.0 <https://pypi.org/project/boto-session-manager/>`_
18- make sure you have run ``chmod +x ecr_login`` to make this script executable
19
20Usage:
21
22.. code-block:: bash
23
24 # show help info
25 $ ./cli -h
26
27 # build image
28 $ ./cli build-image
29
30 # push image
31 $ ./cli push-image
32
33 # test image
34 $ ./cli test-image
35"""
36
37import typing as T
38import os
39import subprocess
40import contextlib
41import dataclasses
42from pathlib import Path
43
44from s3pathlib import S3Path, context
45from boto_session_manager import BotoSesManager
46
47# ------------------------------------------------------------------------------
48# Your project configuration here
49aws_profile = "bmt_app_dev_us_east_1"
50aws_region = "us-east-1"
51repo_name = "aws-batch-example"
52repo_tag = "latest"
53
54
55# ------------------------------------------------------------------------------
56
57
58@contextlib.contextmanager
59def temp_cwd(path: T.Union[str, Path]):
60 """
61 Temporarily set the current working directory (CWD) and automatically
62 switch back when it's done.
63
64 Example:
65
66 .. code-block:: python
67
68 with temp_cwd(Path("/path/to/target/working/directory")):
69 # do something
70 """
71 path = Path(path).absolute()
72 if not path.is_dir():
73 raise NotADirectoryError(f"{path} is not a dir!")
74 cwd = os.getcwd()
75 os.chdir(str(path))
76 try:
77 yield path
78 finally:
79 os.chdir(cwd)
80
81
82@dataclasses.dataclass
83class EcrContext:
84 aws_account_id: str
85 aws_region: str
86 repo_name: str
87 repo_tag: str
88 path_dockerfile: Path
89
90 @property
91 def dir_dockerfile(self) -> Path:
92 return self.path_dockerfile.parent
93
94 @property
95 def image_uri(self) -> str:
96 return f"{self.aws_account_id}.dkr.ecr.{self.aws_region}.amazonaws.com/{self.repo_name}:{self.repo_tag}"
97
98 def build_image(self):
99 with temp_cwd(self.dir_dockerfile):
100 args = ["docker", "build", "-t", self.image_uri, "."]
101 subprocess.run(args, check=True)
102
103 def push_image(self):
104 with temp_cwd(self.dir_dockerfile):
105 args = [
106 "docker",
107 "push",
108 self.image_uri,
109 ]
110 subprocess.run(args, check=True)
111
112 def test_image(self):
113 with temp_cwd(dir_here):
114 s3bucket = f"{bsm.aws_account_id}-{bsm.aws_region}-data"
115 s3dir_source = S3Path(f"s3://{s3bucket}/projects/aws_batch_example/source/")
116 s3dir_target = S3Path(f"s3://{s3bucket}/projects/aws_batch_example/target/")
117 s3dir_source.delete()
118 s3dir_target.delete()
119 s3dir_source.joinpath("test.txt").write_text("hello-world")
120 print(f"preview source: {s3dir_source.console_url}")
121 print(f"preview target: {s3dir_target.console_url}")
122
123 args = [
124 "docker",
125 "run",
126 "--rm",
127 self.image_uri,
128 "--region",
129 "us-east-1",
130 "--s3uri_source",
131 s3dir_source.uri,
132 "--s3uri_target",
133 s3dir_target.uri,
134 ]
135 subprocess.run(args, check=True)
136
137
138dir_here = Path(__file__).absolute().parent
139path_dockerfile = dir_here.joinpath("Dockerfile")
140
141IS_LOCAL = False
142IS_CI = False
143IS_C9 = False
144if "CI" in os.environ or "CODEBUILD_CI" in os.environ:
145 IS_CI = True
146elif "C9_USER" in os.environ:
147 IS_C9 = True
148else:
149 IS_LOCAL = True
150
151if IS_LOCAL:
152 bsm = BotoSesManager(profile_name=aws_profile)
153elif IS_CI:
154 bsm = BotoSesManager(region_name=aws_region)
155elif IS_C9:
156 bsm = BotoSesManager(region_name=aws_region)
157else:
158 raise RuntimeError
159
160context.attach_boto_session(bsm.boto_ses)
161
162ecr_context = EcrContext(
163 aws_account_id=bsm.aws_account_id,
164 aws_region=aws_region,
165 repo_name=repo_name,
166 repo_tag=repo_tag,
167 path_dockerfile=path_dockerfile,
168)
169
170
171class Main:
172 def build_image(self):
173 """
174 Build the docker image.
175 """
176 ecr_context.build_image()
177
178 def push_image(self):
179 """
180 Push the docker image to ECR.
181 """
182 ecr_context.push_image()
183
184 def test_image(self):
185 """
186 Test the docker image.
187 """
188 ecr_context.test_image()
189
190
191if __name__ == "__main__":
192 import fire
193
194 fire.Fire(Main)
现在我们的 Container 已经就绪了, 可以开始配置我们的 Batch Job 了.
Configuration¶
这一节里我们来配置 Compute Environment, Job Queue 和 Job Definition.
Computer Environment¶
我们首先来配置计算环境.
- Step 1: Compute environment configuration
- Compute environment configuration:
Platform: Fargate
Name:
aws_batch_example
Service role: use the default AWSServiceRoleForBatch
- Step 2: Instance configuration
Use Fargate Spot capacity: turn it on (to save cost)
Maximum vCPUs: 4 (make it small to save cost)
- Step 3: Network configuration
VPC and Subnet and Security Group: use your default VPC, public subnet, default security group
Job Queue¶
然后来配置 Job Queue
Orchestration type: Fargate
Name:
aws_batch_example
Scheduling policy Amazon Resource Name (optional): leave it empty
Connected compute environments: use the
aws_batch_example
you just created
Job Definition¶
最后来配置 Job Definition
- Step 1: Job definition configuration
Orchestration type: Fargate
- General configuration:
Name:
aws_batch_example
Execution timeout: 60 (seconds)
Scheduling priority: leave it empty, this is for advanced scheduling
- Fargate platform configuration
Fargate platform version: LATEST (default)
(IMPORTANT) info=Infolabel=Assign public IP: turn it on.
If it is on, then it allows your task to have outbound network access to the internet, so you can talk to ECR service endpoint to pull your image. If it is off, you have to ensure that you have a NatGateway on your VPC to route traffic to the internet (but it is expansive). If it is off and you don’t have a NatGateway, then you cannot pull container image from ECR. You can also use ECR VPC Endpoint to create internal connection between your VPC and ECR service endpoint. See this discussion: https://repost.aws/knowledge-center/ecs-pull-container-api-error-ecr
Ephemeral storage:
Execution role:
Job attempts: 1
Retry strategy conditions: leave it empty
- Step 2: Container configuration
Image: ${aws_account_id}.dkr.ecr.${aws_region}.amazonaws.com/aws-batch-example:latest
- Command syntax:
JSON:
["--region","us-east-1","--s3uri_source","Ref::s3uri_source","--s3uri_target","Ref::s3uri_target"]
.
Parameters: add two parameter name
s3uri_source
ands3uri_target
.- Environment configuration:
Job role configuration:
vCPUs: 0.25
Memory: 0.5
- Step 3 (optional): Linux and logging settings
leave everything empty
Test by Submitting a Job¶
最后我们就可以来运行一个 Job 了.
- Step 1: Job configuration
Name:
aws_batch_example
Job definition:
aws_batch_example:1
Job queue:
aws_batch_example
- Step 2 (optional): Overrides
use default for Everything except:
- Additional configuration -> Parameters: because we set two parameters in job definition, so we have to give them a value here.
s3uri_source:
s3://${aws_account_id}-${aws_region}-data/projects/aws_batch_example/source/
s3uri_target:
s3://${aws_account_id}-${aws_region}-data/projects/aws_batch_example/target/
等待几秒钟, 你就可以看到 Job 从 Submitted 状态变成 Ready -> Running -> Succeeded. 然后你还可以在 S3 的 target
folder 看到输出的数据了.
Recap¶
下面我们来简单总结一下. 总体来说, 做一个 AWS batch 项目的主要时间都花在了写业务逻辑, 构建容器镜像, 测试镜像上. 这也是 Batch 这个服务的价值所在, 能让你专注于业务逻辑. 而其他的步骤基本上都是在 Console 界面上点击, 花不了多少时间. 这些在 Console 上的 configuration 我们在实验项目中可以用人工点击. 但是在生产项目中, 我们会需要用 CloudFormation 工具来管理这些 configuration, 而不是人工点击.
下一步, 可能你会想将一个实验性质的项目变成一个可重复利用, 具有自动化构建, 测试, 部署的企业级成熟应用. 这时候你就需要用到 CI/CD 工具了. 我们会在下一篇文章里介绍这一企业级的架构.