[cleanup] Misc fixes

Cherry-picks from: #3498, #3947
Related: #3949, https://github.com/yt-dlp/yt-dlp/issues/1839#issuecomment-1140313836
Authored by: pukkandan, flashdagger, gamer191
pull/3727/head
pukkandan 2 years ago
parent c4910024f3
commit 1890fc6389
No known key found for this signature in database
GPG Key ID: 7EEE9E1E817D0A39

@ -9,7 +9,7 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm reporting a site feature request - label: I'm requesting a site-specific feature
required: true required: true
- label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit) - label: I've verified that I'm running yt-dlp version **2022.05.18** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true required: true

@ -9,7 +9,7 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm reporting a feature request - label: I'm requesting a feature unrelated to a specific site
required: true required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme) - label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true required: true

@ -9,13 +9,13 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm asking a question and **not** reporting a bug/feature request - label: I'm asking a question and **not** reporting a bug or requesting a feature
required: true required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme) - label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue) - label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones. DO NOT post duplicates
required: true required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones - label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
required: true required: true
- type: textarea - type: textarea
id: question id: question

@ -9,7 +9,7 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm reporting a site feature request - label: I'm requesting a site-specific feature
required: true required: true
- label: I've verified that I'm running yt-dlp version **%(version)s** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit) - label: I've verified that I'm running yt-dlp version **%(version)s** ([update instructions](https://github.com/yt-dlp/yt-dlp#update)) or later (specify commit)
required: true required: true

@ -9,7 +9,7 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm reporting a feature request - label: I'm requesting a feature unrelated to a specific site
required: true required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme) - label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true required: true

@ -9,13 +9,13 @@ body:
description: | description: |
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp: Carefully read and work through this check list in order to prevent the most common mistakes and misuse of yt-dlp:
options: options:
- label: I'm asking a question and **not** reporting a bug/feature request - label: I'm asking a question and **not** reporting a bug or requesting a feature
required: true required: true
- label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme) - label: I've looked through the [README](https://github.com/yt-dlp/yt-dlp#readme)
required: true required: true
- label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue) - label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones. DO NOT post duplicates
required: true required: true
- label: I've searched the [bugtracker](https://github.com/yt-dlp/yt-dlp/issues?q=) for similar questions including closed ones - label: I've read the [guidelines for opening an issue](https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md#opening-an-issue)
required: true required: true
- type: textarea - type: textarea
id: question id: question

@ -1783,7 +1783,7 @@ with YoutubeDL() as ydl:
ydl.download(URLS) ydl.download(URLS)
``` ```
Most likely, you'll want to use various options. For a list of options available, have a look at [`yt_dlp/YoutubeDL.py`](yt_dlp/YoutubeDL.py#L181). Most likely, you'll want to use various options. For a list of options available, have a look at [`yt_dlp/YoutubeDL.py`](yt_dlp/YoutubeDL.py#L180).
**Tip**: If you are porting your code from youtube-dl to yt-dlp, one important point to look out for is that we do not guarantee the return value of `YoutubeDL.extract_info` to be json serializable, or even be a dictionary. It will be dictionary-like, but if you want to ensure it is a serializable dictionary, pass it through `YoutubeDL.sanitize_info` as shown in the [example below](#extracting-information) **Tip**: If you are porting your code from youtube-dl to yt-dlp, one important point to look out for is that we do not guarantee the return value of `YoutubeDL.extract_info` to be json serializable, or even be a dictionary. It will be dictionary-like, but if you want to ensure it is a serializable dictionary, pass it through `YoutubeDL.sanitize_info` as shown in the [example below](#extracting-information)

@ -105,7 +105,7 @@ def pycryptodome_module():
def set_version_info(exe, version): def set_version_info(exe, version):
if OS_NAME == 'Windows': if OS_NAME == 'win32':
windows_set_version(exe, version) windows_set_version(exe, version)

@ -195,13 +195,6 @@ class YoutubeDL:
For compatibility, a single list is also accepted For compatibility, a single list is also accepted
print_to_file: A dict with keys WHEN (same as forceprint) mapped to print_to_file: A dict with keys WHEN (same as forceprint) mapped to
a list of tuples with (template, filename) a list of tuples with (template, filename)
forceurl: Force printing final URL. (Deprecated)
forcetitle: Force printing title. (Deprecated)
forceid: Force printing ID. (Deprecated)
forcethumbnail: Force printing thumbnail URL. (Deprecated)
forcedescription: Force printing description. (Deprecated)
forcefilename: Force printing final filename. (Deprecated)
forceduration: Force printing duration. (Deprecated)
forcejson: Force printing info_dict as JSON. forcejson: Force printing info_dict as JSON.
dump_single_json: Force printing the info_dict of the whole playlist dump_single_json: Force printing the info_dict of the whole playlist
(or video) as a single JSON line. (or video) as a single JSON line.
@ -278,9 +271,6 @@ class YoutubeDL:
writedesktoplink: Write a Linux internet shortcut file (.desktop) writedesktoplink: Write a Linux internet shortcut file (.desktop)
writesubtitles: Write the video subtitles to a file writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Deprecated - Use subtitleslangs = ['all']
Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video listsubtitles: Lists all available subtitles for the video
subtitlesformat: The format code for subtitles subtitlesformat: The format code for subtitles
subtitleslangs: List of languages of the subtitles to download (can be regex). subtitleslangs: List of languages of the subtitles to download (can be regex).
@ -334,7 +324,6 @@ class YoutubeDL:
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
debug_printtraffic:Print out sent and received HTTP traffic debug_printtraffic:Print out sent and received HTTP traffic
include_ads: Download ads as well (deprecated)
default_search: Prepend this string if an input url is not valid. default_search: Prepend this string if an input url is not valid.
'auto' for elaborate guessing 'auto' for elaborate guessing
encoding: Use this encoding instead of the system-specified. encoding: Use this encoding instead of the system-specified.
@ -350,10 +339,6 @@ class YoutubeDL:
* when: When to run the postprocessor. Allowed values are * when: When to run the postprocessor. Allowed values are
the entries of utils.POSTPROCESS_WHEN the entries of utils.POSTPROCESS_WHEN
Assumed to be 'post_process' if not given Assumed to be 'post_process' if not given
post_hooks: Deprecated - Register a custom postprocessor instead
A list of functions that get called as the final step
for each video file, after all postprocessors have been
called. The filename will be passed as the only argument.
progress_hooks: A list of functions that get called on download progress_hooks: A list of functions that get called on download
progress, with a dictionary with the entries progress, with a dictionary with the entries
* status: One of "downloading", "error", or "finished". * status: One of "downloading", "error", or "finished".
@ -398,8 +383,6 @@ class YoutubeDL:
- "detect_or_warn": check whether we can do anything - "detect_or_warn": check whether we can do anything
about it, warn otherwise (default) about it, warn otherwise (default)
source_address: Client-side IP address to bind to. source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
yt-dlp servers for debugging. (BROKEN)
sleep_interval_requests: Number of seconds to sleep between requests sleep_interval_requests: Number of seconds to sleep between requests
during extraction during extraction
sleep_interval: Number of seconds to sleep before each download when sleep_interval: Number of seconds to sleep before each download when
@ -440,11 +423,6 @@ class YoutubeDL:
external downloader to use for it. The allowed protocols external downloader to use for it. The allowed protocols
are default|http|ftp|m3u8|dash|rtsp|rtmp|mms. are default|http|ftp|m3u8|dash|rtsp|rtmp|mms.
Set the value to 'native' to use the native downloader Set the value to 'native' to use the native downloader
hls_prefer_native: Deprecated - Use external_downloader = {'m3u8': 'native'}
or {'m3u8': 'ffmpeg'} instead.
Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
compat_opts: Compatibility options. See "Differences in default behavior". compat_opts: Compatibility options. See "Differences in default behavior".
The following options do not work when used through the API: The following options do not work when used through the API:
filename, abort-on-error, multistreams, no-live-chat, format-sort filename, abort-on-error, multistreams, no-live-chat, format-sort
@ -466,8 +444,6 @@ class YoutubeDL:
external_downloader_args, concurrent_fragment_downloads. external_downloader_args, concurrent_fragment_downloads.
The following options are used by the post processors: The following options are used by the post processors:
prefer_ffmpeg: If False, use avconv instead of ffmpeg if both are available,
otherwise prefer ffmpeg. (avconv support is deprecated)
ffmpeg_location: Location of the ffmpeg/avconv binary; either the path ffmpeg_location: Location of the ffmpeg/avconv binary; either the path
to the binary or its containing directory. to the binary or its containing directory.
postprocessor_args: A dictionary of postprocessor/executable keys (in lower case) postprocessor_args: A dictionary of postprocessor/executable keys (in lower case)
@ -487,12 +463,48 @@ class YoutubeDL:
See "EXTRACTOR ARGUMENTS" for details. See "EXTRACTOR ARGUMENTS" for details.
Eg: {'youtube': {'skip': ['dash', 'hls']}} Eg: {'youtube': {'skip': ['dash', 'hls']}}
mark_watched: Mark videos watched (even with --simulate). Only for YouTube mark_watched: Mark videos watched (even with --simulate). Only for YouTube
youtube_include_dash_manifest: Deprecated - Use extractor_args instead.
The following options are deprecated and may be removed in the future:
forceurl: - Use forceprint
Force printing final URL.
forcetitle: - Use forceprint
Force printing title.
forceid: - Use forceprint
Force printing ID.
forcethumbnail: - Use forceprint
Force printing thumbnail URL.
forcedescription: - Use forceprint
Force printing description.
forcefilename: - Use forceprint
Force printing final filename.
forceduration: - Use forceprint
Force printing duration.
allsubtitles: - Use subtitleslangs = ['all']
Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
include_ads: - Doesn't work
Download ads as well
call_home: - Not implemented
Boolean, true iff we are allowed to contact the
yt-dlp servers for debugging.
post_hooks: - Register a custom postprocessor
A list of functions that get called as the final step
for each video file, after all postprocessors have been
called. The filename will be passed as the only argument.
hls_prefer_native: - Use external_downloader = {'m3u8': 'native'} or {'m3u8': 'ffmpeg'}.
Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
prefer_ffmpeg: - avconv support is deprecated
If False, use avconv instead of ffmpeg if both are available,
otherwise prefer ffmpeg.
youtube_include_dash_manifest: - Use extractor_args
If True (default), DASH manifests and related If True (default), DASH manifests and related
data will be downloaded and processed by extractor. data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't You can reduce network I/O by disabling it if you don't
care about DASH. (only for youtube) care about DASH. (only for youtube)
youtube_include_hls_manifest: Deprecated - Use extractor_args instead. youtube_include_hls_manifest: - Use extractor_args
If True (default), HLS manifests and related If True (default), HLS manifests and related
data will be downloaded and processed by extractor. data will be downloaded and processed by extractor.
You can reduce network I/O by disabling it if you don't You can reduce network I/O by disabling it if you don't

@ -496,12 +496,20 @@ class FragmentFD(FileDownloader):
self.report_warning('The download speed shown is only of one thread. This is a known issue and patches are welcome') self.report_warning('The download speed shown is only of one thread. This is a known issue and patches are welcome')
with tpe or concurrent.futures.ThreadPoolExecutor(max_workers) as pool: with tpe or concurrent.futures.ThreadPoolExecutor(max_workers) as pool:
for fragment, frag_index, frag_filename in pool.map(_download_fragment, fragments): try:
ctx['fragment_filename_sanitized'] = frag_filename for fragment, frag_index, frag_filename in pool.map(_download_fragment, fragments):
ctx['fragment_index'] = frag_index ctx.update({
result = append_fragment(decrypt_fragment(fragment, self._read_fragment(ctx)), frag_index, ctx) 'fragment_filename_sanitized': frag_filename,
if not result: 'fragment_index': frag_index,
return False })
if not append_fragment(decrypt_fragment(fragment, self._read_fragment(ctx)), frag_index, ctx):
return False
except KeyboardInterrupt:
self._finish_multiline_status()
self.report_error(
'Interrupted by user. Waiting for all threads to shutdown...', is_error=False, tb=False)
pool.shutdown(wait=False)
raise
else: else:
for fragment in fragments: for fragment in fragments:
if not interrupt_trigger[0]: if not interrupt_trigger[0]:

@ -786,7 +786,8 @@ class InfoExtractor:
self.report_warning(errmsg) self.report_warning(errmsg)
return False return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None): def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True,
encoding=None, data=None, headers={}, query={}, expected_status=None):
""" """
Return a tuple (page content as string, URL handle). Return a tuple (page content as string, URL handle).
@ -943,7 +944,7 @@ class InfoExtractor:
except ValueError: except ValueError:
raise e raise e
except ValueError as ve: except ValueError as ve:
errmsg = '%s: Failed to parse JSON ' % video_id errmsg = f'{video_id}: Failed to parse JSON'
if fatal: if fatal:
raise ExtractorError(errmsg, cause=ve) raise ExtractorError(errmsg, cause=ve)
else: else:

@ -15,7 +15,7 @@ import time
import traceback import traceback
from .common import InfoExtractor, SearchInfoExtractor from .common import InfoExtractor, SearchInfoExtractor
from ..compat import functools from ..compat import functools # isort: split
from ..compat import ( from ..compat import (
compat_chr, compat_chr,
compat_HTTPError, compat_HTTPError,
@ -483,6 +483,11 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if data: if data:
return self._parse_json(data, item_id, fatal=fatal) return self._parse_json(data, item_id, fatal=fatal)
def _extract_yt_initial_variable(self, webpage, regex, video_id, name):
return self._parse_json(self._search_regex(
(fr'{regex}\s*{self._YT_INITIAL_BOUNDARY_RE}',
regex), webpage, name, default='{}'), video_id, fatal=False, lenient=True)
@staticmethod @staticmethod
def _extract_session_index(*data): def _extract_session_index(*data):
""" """
@ -2733,54 +2738,38 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
chapter_time = lambda chapter: parse_duration(self._get_text(chapter, 'timeDescription')) chapter_time = lambda chapter: parse_duration(self._get_text(chapter, 'timeDescription'))
chapter_title = lambda chapter: self._get_text(chapter, 'title') chapter_title = lambda chapter: self._get_text(chapter, 'title')
return next(( return next(filter(None, (
filter(None, ( self._extract_chapters(traverse_obj(contents, (..., 'macroMarkersListItemRenderer')),
self._extract_chapters( chapter_time, chapter_title, duration)
traverse_obj(contents, (..., 'macroMarkersListItemRenderer')), for contents in content_list)), [])
chapter_time, chapter_title, duration)
for contents in content_list
))), [])
@staticmethod def _extract_chapters_from_description(self, description, duration):
def _extract_chapters_from_description(description, duration): return self._extract_chapters(
chapters = [{'start_time': 0}] re.findall(r'(?m)^((?:\d+:)?\d{1,2}:\d{2})\b\W*\s(.+?)\s*$', description or ''),
for timestamp, title in re.findall( chapter_time=lambda x: parse_duration(x[0]), chapter_title=lambda x: x[1],
r'(?m)^((?:\d+:)?\d{1,2}:\d{2})\b\W*\s(.+?)\s*$', description or ''): duration=duration, strict=False)
start = parse_duration(timestamp)
if start and title and chapters[-1]['start_time'] < start < duration:
chapters[-1]['end_time'] = start
chapters.append({
'start_time': start,
'title': title,
})
chapters[-1]['end_time'] = duration
return chapters[1:]
def _extract_chapters(self, chapter_list, chapter_time, chapter_title, duration):
chapters = []
last_chapter = {'start_time': 0}
for idx, chapter in enumerate(chapter_list or []):
title = chapter_title(chapter)
start_time = chapter_time(chapter)
if start_time is None:
continue
last_chapter['end_time'] = start_time
if start_time < last_chapter['start_time']:
if idx == 1:
chapters.pop()
self.report_warning('Invalid start time for chapter "%s"' % last_chapter['title'])
else:
self.report_warning(f'Invalid start time for chapter "{title}"')
continue
last_chapter = {'start_time': start_time, 'title': title}
chapters.append(last_chapter)
last_chapter['end_time'] = duration
return chapters
def _extract_yt_initial_variable(self, webpage, regex, video_id, name): def _extract_chapters(self, chapter_list, chapter_time, chapter_title, duration, strict=True):
return self._parse_json(self._search_regex( if not duration:
(fr'{regex}\s*{self._YT_INITIAL_BOUNDARY_RE}', return
regex), webpage, name, default='{}'), video_id, fatal=False, lenient=True) chapter_list = [{
'start_time': chapter_time(chapter),
'title': chapter_title(chapter),
} for chapter in chapter_list or []]
if not strict:
chapter_list.sort(key=lambda c: c['start_time'] or 0)
chapters = [{'start_time': 0, 'title': '<Untitled>'}]
for idx, chapter in enumerate(chapter_list):
if chapter['start_time'] is None or not chapter['title']:
self.report_warning(f'Incomplete chapter {idx}')
elif chapters[-1]['start_time'] <= chapter['start_time'] <= duration:
chapters[-1]['end_time'] = chapter['start_time']
chapters.append(chapter)
else:
self.report_warning(f'Invalid start time for chapter "{chapter["title"]}"')
chapters[-1]['end_time'] = duration
return chapters if len(chapters) > 1 and chapters[1]['start_time'] else chapters[1:]
def _extract_comment(self, comment_renderer, parent=None): def _extract_comment(self, comment_renderer, parent=None):
comment_id = comment_renderer.get('commentId') comment_id = comment_renderer.get('commentId')
@ -3663,7 +3652,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Youtube Music Auto-generated description # Youtube Music Auto-generated description
if video_description: if video_description:
mobj = re.search(r'(?s)(?P<track>[^·\n]+)·(?P<artist>[^\n]+)\n+(?P<album>[^\n]+)(?:.+?℗\s*(?P<release_year>\d{4})(?!\d))?(?:.+?Released on\s*:\s*(?P<release_date>\d{4}-\d{2}-\d{2}))?(.+?\nArtist\s*:\s*(?P<clean_artist>[^\n]+))?.+\nAuto-generated by YouTube\.\s*$', video_description) mobj = re.search(
r'''(?xs)
(?P<track>[^·\n]+)·(?P<artist>[^\n]+)\n+
(?P<album>[^\n]+)
(?:.+?\s*(?P<release_year>\d{4})(?!\d))?
(?:.+?Released on\s*:\s*(?P<release_date>\d{4}-\d{2}-\d{2}))?
(.+?\nArtist\s*:\s*(?P<clean_artist>[^\n]+))?
.+\nAuto-generated\ by\ YouTube\.\s*$
''', video_description)
if mobj: if mobj:
release_year = mobj.group('release_year') release_year = mobj.group('release_year')
release_date = mobj.group('release_date') release_date = mobj.group('release_date')

@ -776,7 +776,7 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
for key, value in info.items(): for key, value in info.items():
mobj = re.fullmatch(meta_regex, key) mobj = re.fullmatch(meta_regex, key)
if value is not None and mobj: if value is not None and mobj:
metadata[mobj.group('i') or 'common'][mobj.group('key')] = value metadata[mobj.group('i') or 'common'][mobj.group('key')] = value.replace('\0', '')
# Write id3v1 metadata also since Windows Explorer can't handle id3v2 tags # Write id3v1 metadata also since Windows Explorer can't handle id3v2 tags
yield ('-write_id3v1', '1') yield ('-write_id3v1', '1')

@ -1936,7 +1936,7 @@ def intlist_to_bytes(xs):
class LockingUnsupportedError(OSError): class LockingUnsupportedError(OSError):
msg = 'File locking is not supported on this platform' msg = 'File locking is not supported'
def __init__(self): def __init__(self):
super().__init__(self.msg) super().__init__(self.msg)
@ -2061,8 +2061,11 @@ class locked_file:
try: try:
self.f.truncate() self.f.truncate()
except OSError as e: except OSError as e:
if e.errno != 29: # Illegal seek, expected when self.f is a FIFO if e.errno not in (
raise e errno.ESPIPE, # Illegal seek - expected for FIFO
errno.EINVAL, # Invalid argument - expected for /dev/null
):
raise
return self return self
def unlock(self): def unlock(self):

Loading…
Cancel
Save