You may need to extract content from an HTML. And maybe you already use jq command to extract data from JSON documents, but with htmlq you will have a tool similar to this one, it is even written also in Rust programming language, but for HTML.
The htmlq tool is available for other Unix-like systems, and not only for Linux, so you can also use it on FreeBSD, macOS, etc. Also, use CSS selectors to extract the content snippets from the .html files. This is how you point to the elements you want from a web page that you need. For example, you can extract the images, or the text, etc., from a URL.
The first is install htmlq on your Linux. For example, taking a DEB distro as a reference (for others it would be similar, but with the corresponding package manager), we can use:
sudo apt install cargo cargo install htmlq
Once installed, its use is simple. For example, imagine you want to find content on a page by its ID:
curl -s url | htmlq '#css-selector' curl -s url2 | htmlq '#css-selector' curl -s https://www.linuxadictos.com/ | htmlq --pretty '#content' | more
Or, for find all links of a page, you can use this other command:
curl -s https://www.linuxadictos.com | htmlq --attribute href a
Finally, if you have questions about the options available in htmlq, you can check their help with this command:
I hope this little tutorial has helped you. As you can see, its use is simple, and you can combine it with tools such as curl, Among others.