본문 바로가기
프로그래밍언어/Python

파이썬 웹 크롤링(Python web crawling) 만들기 poetry 가상환경 설정 Step 4

by by 앵과장 2022. 10. 20.
반응형

파이썬으로 conda 가상환경을 구성하고 fastapi 프레임워크를 기반으로  github에 작업해보려고하니 파이썬은 패키지 구성이나 버전등을 구성할수 있는 스켈레톤형태의 구조가 필요한데 github에 올릴려다보니 setup.py 또는 pyproject.toml 을 올려야 가능하다는 오류를 보고서 conda가 언제나온거지 좀 찾다보니  poetry라는 가상환경설정이 좀더 최근에 올라온것으로 확인이 되어 급선회 해보려고합니다.

 

어차피 시작하는거 최신꺼로 해야 손도덜가고 그러는것 아닌가여 

아직 깊이가 전혀 없기때문에 사용하는 도구라도 최신에 나온걸로 쓰고싶어서 흔들리는 갈대처럼 써보도록 하겠습니다.

 

 Poetry

https://hackersandslackers.com/python-poetry-package-manager/

 

Package Python Projects the Proper Way with Poetry

Revolutionize your development workflow with an elegant CLI to handle dependencies, environments, configuration, and packaging.

hackersandslackers.com

1. Mac 유저라면 poetry 설치

brew install poetry

 

brew 설치가 아직이라면 아래 내용을 참조하세요

https://angryfullstack.tistory.com/entry/Mac-OS-%EB%A7%A5%EB%B6%81-Homebrew-m1-%EC%84%A4%EC%B9%98-%EB%B0%A9%EB%B2%95

 

Mac OS 맥북 Homebrew m1 설치 방법

Homebrew 설치 하는이유? mac OS 에서 다양한 패키지를 설치하기 위해 맥주모양 아이콘이 심볼인 편리함을 도와주는 도구입니다. M1 맥북출시 이후 정상적으로 설치 되지 않아서 많은 맥유저들이 살

angryfullstack.tistory.com

brew 가 설치되셨다면

brew install poetry 실행

(base) renzo@renzoui-MacBookPro local % brew install poetry
Running `brew update --auto-update`...
==> Downloading https://ghcr.io/v2/homebrew/core/gdbm/manifests/1.23
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/gdbm/blobs/sha256:62a2c1994737a2677f318a97ac64a32690f9f958086310a49f37e3fcfd5b6731
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:62a2c1994737a2677f318a97ac64a32690f9f958086310a49f37e3fc
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/mpdecimal/manifests/2.5.1
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/mpdecimal/blobs/sha256:726e8ec0713eb452bb744fe9147771bacc2c3713a128aaee03b6ddcc78011d1a
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:726e8ec0713eb452bb744fe9147771bacc2c3713a128aaee03b6ddcc
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/ca-certificates/manifests/2022-10-11
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/ca-certificates/blobs/sha256:1b264e579e31b3041a87ff91f09d5f7cc0d51fea1c83e63aee17a1b95509cbe
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:1b264e579e31b3041a87ff91f09d5f7cc0d51fea1c83e63aee17a1b9
######################################################################## 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/openssl/1.1/manifests/1.1.1q

...
..
.
생략

🍺  /opt/homebrew/Cellar/six/1.16.0_2: 20 files, 122.3KB
==> Installing poetry dependency: jsonschema
==> Pouring jsonschema--4.16.0.arm64_monterey.bottle.tar.gz
🍺  /opt/homebrew/Cellar/jsonschema/4.16.0: 876 files, 11.2MB
==> Installing poetry
==> Pouring poetry--1.2.2.arm64_monterey.bottle.tar.gz
==> Caveats
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions
==> Summary
🍺  /opt/homebrew/Cellar/poetry/1.2.2: 2,206 files, 29.7MB
==> Running `brew cleanup poetry`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
==> Caveats
==> poetry
zsh completions have been installed to:
  /opt/homebrew/share/zsh/site-functions

2.정상적으로 설치가 되었다면 poetry init 프로젝트 패키지 생성

poetry init

peotry init 진행 순차적으로 

필요한 패키지 생성에 필수정보를 입력하시거나 옵션같은경우 enter 누르고 skip가능합니다.

(base) renzo@renzoui-MacBookPro webcrawling % poetry init

This command will guide you through creating your pyproject.toml config.

Package name [webcrawling]:  crawling
Version [0.1.0]:  0.0.1
Description []:  web crawling
Author [이순우 <renzo.1980@jobkorea.co.kr>, n to skip]:
License []:
Compatible Python versions [^3.10]:

Would you like to define your main dependencies interactively? (yes/no) [yes] yes
You can specify a package in the following forms:
  - A single name (requests): this will search for matches on PyPI
  - A name and a constraint (requests@^2.23.0)
  - A git url (git+https://github.com/python-poetry/poetry.git)
  - A git url with a revision (git+https://github.com/python-poetry/poetry.git#develop)
  - A file path (../my-package/my-package.whl)
  - A directory (../my-package/)
  - A url (https://example.com/packages/my-package-0.1.0.tar.gz)

Package to add or search for (leave blank to skip):

Would you like to define your development dependencies interactively? (yes/no) [yes] yes
Package to add or search for (leave blank to skip):

Generated file

[tool.poetry]
name = "crawling"
version = "0.0.1"
description = "web crawling"
authors = ["renzo <renzo@gmail.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"


Do you confirm generation? (yes/no) [yes] yes
(base) renzo@renzoui-MacBookPro webcrawling % ls -al
total 32
drwxr-xr-x   7 renzo  staff   224 10 19 16:53 .
drwxr-xr-x  13 renzo  staff   416 10 19 16:51 ..
drwxr-xr-x  13 renzo  staff   416 10 19 16:51 .git
-rw-r--r--   1 renzo  staff  1799 10 19 16:51 .gitignore
-rw-r--r--   1 renzo  staff  1069 10 19 16:51 LICENSE
-rw-r--r--   1 renzo  staff    13 10 19 16:51 README.md
-rw-r--r--   1 renzo  staff   283 10 19 16:53 pyproject.toml
(base) renzo@renzoui-MacBookPro webcrawling % cat pyproject.toml
[tool.poetry]
name = "crawling"
version = "0.0.1"
description = "web crawling"
authors = ["renzo <renzo@gmail.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

입력해달라는 부분까지 진행하시면 pyproject.toml 이 생성됩니다.

 

진행해보니 방식이 https://start.spring.io/

스프링 부트 터미널버전으로 보이네요 !!

3. poetry 가상환경 실행 

poetry shell
(base) renzo@renzoui-MacBookPro webcrawling % poetry shell
Creating virtualenv crawling--PDR4Hy8-py3.10 in /Users/renzo/Library/Caches/pypoetry/virtualenvs
Spawning shell within /Users/renzo/Library/Caches/pypoetry/virtualenvs/crawling--PDR4Hy8-py3.10
(base) renzo@renzoui-MacBookPro webcrawling % . /Users/renzo/Library/Caches/pypoetry/virtualenvs/crawling--PDR4Hy8-py3.10/
bin/activate

가상환경이 구동되면 (base) 부분이 가상화로 변경됩니다. 이부분은 conda와 비슷하네요

역시 환경설정이 가장 어떤게 좋은지 정확히 모르겠지만 일단 진행을 해보도록 하겠습니다.

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % ls -al
total 32
drwxr-xr-x   7 renzo  staff   224 10 19 16:53 .
drwxr-xr-x  13 renzo  staff   416 10 19 16:51 ..
drwxr-xr-x  13 renzo  staff   416 10 19 16:51 .git
-rw-r--r--   1 renzo  staff  1799 10 19 16:51 .gitignore
-rw-r--r--   1 renzo  staff  1069 10 19 16:51 LICENSE
-rw-r--r--   1 renzo  staff    13 10 19 16:51 README.md
-rw-r--r--   1 renzo  staff   283 10 19 16:53 pyproject.toml
(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling %

(crawling-py3.10) 으로바뀐걸 확인할수 있고 init에서 입력했던 정보기준으로 변경된것 같습니다.

프로젝트명 + 파이썬 버전이 식별할수있는 가상환경 이름으로 보입니다.

 

4.poetry 가상환경에 파이썬 패키지 설치하기

web crawling을 진행하기위해서 필요한 패키지로 

 

fastapi

requests

beautifulsoup4

selenium

 

4가지를 설치할떄 아래처럼 poetry add 뒤로 뛰어쓰기 형태로 실행시키면 설치가 됩니다

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % poetry add fastapi requests beautifulsoup4 selenium
Using version ^0.85.1 for fastapi
Using version ^2.28.1 for requests
Using version ^4.11.1 for beautifulsoup4
Using version ^4.5.0 for selenium

Updating dependencies
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/1d/38/fa96a426e0c0e68aabc68e896584b83ad1eec7
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/32/46/9cb0e58b2deb7f82b84065f37f3bffeb12413f
Resolving dependencies... Downloading https://files.pythonhosted.org/packages/62/d5/5f610ebe421e85889f2e55e33b7f9a6795bd98
Resolving dependencies... (7.9s)

Writing lock file

Package operations: 24 installs, 0 updates, 0 removals

  • Installing attrs (22.1.0)
  • Installing async-generator (1.10)
  • Installing exceptiongroup (1.0.0rc9)
  • Installing h11 (0.14.0)
  • Installing idna (3.4)
  • Installing outcome (1.2.0)
  • Installing sniffio (1.3.0)
  • Installing sortedcontainers (2.4.0)
  • Installing anyio (3.6.2)
  • Installing pysocks (1.7.1)
  • Installing trio (0.22.0)
  • Installing typing-extensions (4.4.0)
  • Installing wsproto (1.2.0)
  • Installing certifi (2022.9.24)
  • Installing charset-normalizer (2.1.1)
  • Installing pydantic (1.10.2)
  • Installing soupsieve (2.3.2.post1)
  • Installing starlette (0.20.4)
  • Installing trio-websocket (0.9.2)
  • Installing urllib3 (1.26.12)
  • Installing beautifulsoup4 (4.11.1)
  • Installing fastapi (0.85.1)
  • Installing requests (2.28.1)
  • Installing selenium (4.5.0)

5.자동으로 갱신되는 pyproject.toml

poetry내에서 패키지를 설치하게되면 자동으로 pyproject.toml이 변경됩니다.

 

java로 따지면 gradle or Maven에 설치해야되는 라이브러리를 dependency 하고 나서 gradle를 갱신하는것과 유사하지만 이부분에서는 python으로 처리하는것이 좀더 편리한 느낌이 듭니다.

 

4. github 올리기

지금까지 작업한 것을 올리려면 우선 github에 Repogitory하나를 생성하시기 바랍니다.

그렇게 한뒤 git clone 으로 local에 복제합니다.

 

저는 mac에 아래 경로에 프로젝트를 생성하고 git clone 하도록 하겠습니다.

/Users/renzo/workspace/webcrawling

git 사용방법은 파이팅 잘찾아보세요!!

git clone https://github.com/lswteen/webcrawling.git

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % git branch
* main

 

변경된 모든 파일 add

git add . 

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % git add .

git status 로 변경사항 잘 add되었는지 확인후 

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % git status
On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   poetry.lock
	modified:   pyproject.toml

git commit 해서 올리면 pyproject.toml이 생성되었기 때문에 정상적으로 commit 됩니다.

(crawling-py3.10) (base) renzo@renzoui-MacBookPro webcrawling % git commit -m "add : init python toml "
[main 76700fc] add : init python toml
 2 files changed, 523 insertions(+), 7 deletions(-)
 create mode 100644 poetry.lock

https://github.com/lswteen/webcrawling

 

GitHub - lswteen/webcrawling

Contribute to lswteen/webcrawling development by creating an account on GitHub.

github.com

정상적으로 python에 필요한 패키지 구성 및 toml파일 생성이되었습니다.

원격지로 여러명 함께 프로젝트 진행해야한다면 conda 보다는 poetry 가상환경설정이 좀더 유연한것 같습니다.

 

mac 터미널로 계속 git명령어랑 개발진행하려니 너무 답답해서 

https://www.jetbrains.com/ko-kr/pycharm/download/

 

다운로드 PyCharm: JetBrains가 만든 전문 개발자용 Python IDE

 

www.jetbrains.com

vscode설치후 진행해보도록 하겠습니다.