One of the ways that I frequently fetch files from the internet is with wget. It is a very useful command line utility that is capable of fetching anything from one file to mirroring whole sites.
When looking for new and interesting music I often find myself on a page with a few or more mp3 urls. Using firefox to download them all, even with a good download manager is tedious. The following bash script will read from stdin and download each url it sees.
# while read url ; do wget "${url}" ; done
This will start “listening” on stdin for any text. All I have to do now is drag the links from firefox into my gnome-terminal, and hit enter in the terminal after each url.
Dragging each URL gets tiring after the 3rd or 4th. Javascript comes to the rescue in the form of a bookmarklet. A bookmarklet is some javascript, saved as a bookmark, that can perform tasks within the DOM of the current document. Follow the link below for some of my bookmarklets, including the UrlLister.
To use the UrlLister, click the bookmark when viewing a page with a few mp3s. Answer the promt, or hit enter for the default (.mp3). Once the popup loads, press Ctrl-A (select all) in the popup window, copy, then paste the clipboard into the terminal. All the URLs should now download, one at a time. With Linux this is particulary easy since highlighting the text copies it to the buffer, and center-clicking in the terminal will paste that text, no need to use the keyboard.
When the downloading stop, hit enter to be sure the last URL downloads. (If you click and drag to highlight you may miss the end of the last line.) When all is done, hit Crtl-C to kill the while loop and start listening to the mp3s.
I have a number of bookmarklets, including the UrlLister, at http://anton.lr2.com/marklets/.
If you already have a list of mp3s (or other URLs), one per line, then you can pipe the file into the loop.
# cat listofurls.txt | while read url ; do wget "${url}" ; done
If the text file contains information other than URLs, we can use sed to remove the junk and produce a clean list of URLs for us to download.
# cat listofurls.txt | sed -n 's/.*http\(.*\)\.mp3.*/http\1.mp3/p' | while read url ; do wget "${url}" ; done
The regular expression will match <anything>http(anything).mp3</anything><anything>. The p at the end and the \1 in the replacement pattern causes (.*) to be printed, so we need to add the http, ad .mp3 back in. There are some problems with this regexp, but it works in most cases. If there is more than one URL on a line, we will only fetch the first one, and if the URL path has a .mp3 in it, the URL will get truncated.
7 Comments
Just my 2 cents….
Why not use Option-click (on a Mac; for Windows it’s probably control-click) to automatically download whatever link is clicked?
That way, if you’re sitting on a web page with lots of interesting things you want to download, just hold down that modifier key and click click click. This works on all versions of Firefox (other browsers, too), and requires no setup or configuration.
That’s all fine if there are a few, or even 5 files to download, but what about a page with 100 mp3s?
Now assume the server limits you to just 4 connections. You’ll have to wait for the first to finish before you can even start the 5th.
I’m fully aware of the download option. If you read the second paragraph of the article you will see that I indicate it’s tedious, not impossible.
Sorry to burst your bubble but wget has a recursive option which works even better than that.
Nick,
I don’t have a bubble to burst. I got tired of reinflating it and just let it go.
I’m very familiar with wget and fully understand it’s capabilities and limitations. The recursive option is great if you want to actually mirror a site or a page, but not so handy if all you want are the MP3s on that one page.
When I’m leeching files I just open a terminal, run a wget loop that reads from stdin, then just paste any urls I want to fetch into the terminal. It’s the poor man’s download manager and it serializes the downloads so it’ll have a minimal impact on the network.
Anton,
Sorry to re-burst your bubble, but if i wanted to download all the MP3′s on a site i’d do:
wget -r -N -A.mp3 http://somesite.com/mp3s/
Nick was right, wget does it better.
But bash is still cool….
You might want to have a look at this Firefox Plugin
Please keep in mind that when I wrote this article, Firefox 1.0 was all we had and we were wishing it didn’t suck so bad.
More than three years later, and this is still a good example of doing a loop in bash, but not an ideal solution for mass downloads.
One Trackback/Pingback
[...] Anton’s Stuff » Blog Archive » using a bash ‘for loop’ to wget (tags: Wgetdownload) [...]
Post a Comment