SWC Header

Cablechip Solutions

web development with Unix, Perl, Javascript, HTML and web services

Title

Cablechip Solution's Blog

Regex : Using expressions

This is the /e modifier

Example: make HTML tags upper case


# ( ) - capture the text
# .*? - match the least amount of text, i.e. an HTML tag
# uc ($1) - make the matches text uppercase
# gxe - g is global replace
# gxe - x is allow whitespace and comments
# gxe - the bit on the right is an expressions to be evaluates

$html =~ s# ( < .*? > ) # print uc($1) #gxe ;


Example : find hi byte characters


# find hi-byte characters in HTML
# - and keep a record of all the hi byte chars found in %found
my %found ;

sub find {
my ($char) = @_ ;
$found{ $chars } ++; ## keep a record
return '[' . ord( $chars ) . ']';
}

## this will find euro and £ symbols, but not €
## \x80 is hex (ascii char 128)
## \x80-\xffff is a range - if the file is utf8, it will match hi bytes chars as well
## gxe - expression, whitespace, and global replace
$html =~ s# [\x80-\xffff] # &find( $1 ) #gxe;

No comments: