Improve download link handling

The previous method relied on the main "download link" in the list page.
But this link was broken a solid 1/4 of the time, and far more often for
some artists.

Instead, during DB build, grab and parse each actual song page too, and
grab from it all possible download links. Use a ThreadPoolExecutor to do
this in a reasonable amount of time (default of 10 workers, but user
configurable).

Then when downloading, iterate over all download links, or provide some
user options for filtering these by ID or description.
This commit is contained in:
Joshua Boniface
2023-04-06 02:06:55 -04:00
parent 3a0ef3dcc6
commit 6ec8923336
2 changed files with 203 additions and 106 deletions

View File

@ -13,11 +13,6 @@ standardized format.
To use the tool, first use the "database" command to build or modify your local JSON database, then use the
"download" command to download songs.
To avoid overloading or abusing the C3DB website, this tool operates exclusively in sequential mode by design; at
most one page is scraped (for "database build") or song downloaded (for "download") at once. Additionally, the tool
design ensures that the JSON database of songs is stored locally, so it only needs to be built once and then is
reused to perform actual downloads without putting further load on the website.
## Installation
1. Install the Python3 requirements from `requirements.txt`.
@ -39,8 +34,9 @@ fetch all avilable songs for all games, and either specify it with the `-u`/`--b
environment variable `C3DBDL_BASE_URL`.
1. Initialize your C3DB JSON database with `c3dbdl [options] database build`. This will take a fair amount
of time to complete as all pages of the chosen base URL are scanned. Note that if you cancel this process, no
data will be saved, so let it complete!
of time to complete as all pages of the chosen base URL, and all song pages (30,000+) are scanned. Note that if
you cancel this process, no data will be saved, so let it complete! The default concurrency setting should make
this relatively quick but YMMV.
1. Download any song(s) you want with `c3dbdl [options] download [options]`.
@ -86,6 +82,9 @@ Downloading song "Rush - Sweet Miracle" by ejthedj...
Downloading from https://dl.c3universe.com/s/ejthedj/sweetMiracle...
```
In addition to the above filters, within each song may be more than one download link. To filter these links,
use the "-i"/"--download-id" and "-d"/"--download-descr" (see the help for details).
Feel free to experiment.
## Output Format