Browse > Home / Archive: 05. March 2005

| Subcribe via RSS

process substitution in bash

March 5th, 2005 | No Comments | Posted in Bash

The internets are slow tonight, and I'm tired, so I'll just leave you with something quick. Tommorrow I'll pick back up on the podcast script.

Let's look at this short script:


#!/bin/bash
LIST=""
ls | while read FILE ; do
   LIST="${FILE} ${LIST}"
done
echo $LIST

You may be suprised to find that LIST is empty after all that looping. This problem always mystified me until I learned about subshells. The pipe creates another shell, a subshell, where the loop executes. When the pipe ends, so does the subshell and all of it's variables.

Fortunately there is a simple way around this. We need to execute the loop in the current shell, and redirect the command back into the loop from a subshell.


#!/bin/bash
LIST=""
while read FILE ; do
   LIST="${FILE} ${LIST}"
done < <(ls)
echo $LIST

<(ls) creates an un-named pipe from the subshell into this shell. We can now redirect that pipe into the loop with another <.

If that got too confusing, try reading the Process Substitution chapter in the Advanced Bash-Scripting Guide.

using sed to parse a file.

March 5th, 2005 | No Comments | Posted in Bash


sed -n 's/.*href="\([^"]*\)".*/\1/p'

-n suppresses printing.

's/…/…/p' is a command. s is search and replace, s/pattern/replacement/. The trailing p is a command to print the result of the command. Since we used a -n to suppress normal printing, this causes sed to print only the replacement text. In most cases the replacement text will be static, but you can also use \1 through \9 to replace with the regular expresions within parentheses.

The pattern in this case is: .*href="([^"]*)".*
.*href=" matches the begining of the line, including the href="
([^"]*) matches everything except a quote (the url itself).
".* matches the quote and the rest of the line.

Using \1 as the replacement text causes the url, and only the url to be printed. If the line doesn't contain a matching pattern, sed continues on silently to the next line.

This method only catches the first url on a line, ignoring the rest. I will attempt to address that in a later article.

Here's a simple example:


# wget -q http://lr2.com/ -O - |sed -n 's/.*href="\([^"]*\)".*/\1/p'

Notice we escaped the parens so bash doesn't get confused and think that's a subscript.

Tomorrow I'll start to build this into a smarter parser that can be used to harvest both web pages and xml feeds for mp3 links. The end goal will be a simple script to fetch podcasts, add them to my library, and automatically dump them on my iPod if it's connected. Along the way I expect to learn a few more bash tricks.

Advanced Bash-Scripting Guide

March 5th, 2005 | No Comments | Posted in General

Advanced Bash-Scripting Guide.

This 'book' has an amazing amount of information. I'll be reading it a lot this weekend, and using what I learn to build a fully automated potcast fetcher complete with date-modified checking and duplicate checking.