How to download YouTube Video captions in XML using Pytube in Python?

  Learn, how to download youtube caption in XML format using pytube in a python programming language, get caption, get author, get views of a youtube video, get caption in SRT format, etc.

Prerequisite

Pytube is a very serious, lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos. Pytube can also help to get the author's name, views of videos, captions of YouTube videos, etc.
There are various APIs python module which can help in to fetch metadata.

Required Module

In this tutorial, we are using the Pytube Module of Python. You can download it using PIP in python.

												
pip install pytube
												
												
											

Approach To Use Pytube

  1. Import the pytube module.
  2. Create and initialize an object of YouTube() and pass the URL of the YouTube Videos as arguments e.g, video_src = YouTube('https://youtu.be/mBJMkFNRVek').
  3. To get a particular language caption from videos, create a variable "en_caption_data" as shown below in the program, then you need to pass the language code for a particular videos e.g, video_src.captions['a.en'].
  4. XML is the default output format of the caption.
  5. To get the YouTube caption use ".xml_captions" available option in Pytube, e.g print(en_caption_data.xml_captions).

Pytube program implementation given below.



Program: Get Caption Using Pytube In Python

												
# Import Pytube module to use API
from pytube import YouTube


url = 'https://youtu.be/mBJMkFNRVek'
# create an object of YouTube() and pass the  URL of YouTube Videos
video_src = YouTube(url)


# print the all avaible caption list, to see  language code
print("All Avaible Captions : \n",video_src.captions)


# to get particular langauge caption you need to pass the language code e.g, captions['a.en']
en_caption_data = video_src.captions['a.en']

print("\nCaption Data in XML Format: \n")
# print caption in xml format
print(en_caption_data.xml_captions)
												
												
											


Output

Pytube YouTube Caption Output
Pytube YouTube Caption Output.
Announcement

Advertisement