Ruby Regexp Machine (v0.3.2b)

By Vishnu Menon

 

Download: http://sourceforge.net/projects/rubyregexp/files

Please Donate: http://sourceforge.net/project/project_donations.php?group_id=556621                    

Support: http://sourceforge.net/projects/rubyregexp/support

Project Summary/Join: http://sourceforge.net/projects/rubyregexp/

 

Introduction

 

Ruby Regexp Machine (RRM) is a free mac application, written in MacRuby, which allows users to write sentences in plain English and convert them into valid ruby regular expressions. Currently, it supports many basic parts of Regexp syntax, but is far from complete.  RRM is useful to people learning how to use regular expressions, and can help show the connection between the regexp syntax and its meaning. When using RRM, it is helpful, though not necessary, to have a basic/minimal knowledge of regular expressions beforehand, as some outputs could be incorrect. NEVER rely solely on RRM to generate regular expressions; always verify that any regexps generated in RRM work as intended. Rubular is a great place to test regular expressions. Regular Expressions generated in RRM are largely compatible with many other languages, such as Perl. Ruby Regexp Machine is licensed under GPL.

 

Use/Instructions

 

Ruby Regexp Machine uses a modified English grammar for input, which is largely similar to Standard English and is extremely easy to understand and use. However, there are some rules that must be followed, and keywords that must be used, to ensure proper output. Using any word that is not a literal string, a keyword, or a number will result in an error message. Please read over these rules carefully before using RRM.

 

Usage Notes:

Š      No punctuation except those defined as keywords below

Š      No double quotes/quotation marks; all literal strings and characters must be surrounded by ‘ ‘ (single quotes or apostrophes)

Š      No keyboard shortcuts (right click to copy or paste)

Š      Capitalization does not matter

 

Recognized Keyword Combinations (in no particular order):

 

Keyword

Meaning

Example             

Any (word, digit, whitespace, string)

Matches any word, digit, etc.…

any string => .*

any digit => \d

any character (except, from)

creates character classes. Can use a rang of characters, or characters to be excluded, or nothing

any character => .

any character except ‘a’ and ‘c’ => [^ac]

any character from ‘a’ – ‘z’ => [a-z]

- (hyphen)

the hyphen denotes a range, and is to be used only in character classes (see above)

any character from ‘A’ – ‘Z’ => [A-Z]

a (space, line start, line end, string start, string end)

A single space, line start, etc.…

a space => ( )

a line start => ^

the (word, character, string)

adds the word, character, or string that follows to the regular expression. Optional

the string ‘hello world’ => hello world

the value of

group keywords, without capturing contents or making a backreference

the value of ….. =>  (?: ….. )

followed by

look ahead for the following, but do not capture

followed by the word ‘foo’  => (?=foo)                                   

followed by not

look ahead for the following to not be present, but do not capture

followed by not the word ‘foo’ => (?!foo)

preceded by

look behind for the following, but do not capture

preceded by the word ‘foo’ => (?<=foo)

preceded by not

look behind for the following to not be present, but do not capture

preceded by not the word ‘foo’ => (?<!foo)

turn (off, on) (case insensitive, whitespace insensitive, any character includes newline)

toggle the three options for the remainder of the regex, or until otherwise indicated

turn off case insensitive => (?-i)

turn on case insensitive => (?i)

capture

groups keywords, captures result in regular expression, creates a backreference

capture …. => (….)

either…. or

two options, surrounded by parenthesis

either foo or bar => (foo|bar)

all

group everything between the last grouping keyword (ex. ‘the value of’, ‘capture’ ) and the current location

the value of ….. all  => (…)

repeated .. to .. times

the preceding item repeated a range of times. Infinity is used for no limit on either the upper or lower range. Use to instead of – (hyphen) for the ‘repeated’ keyword

repeated 2 to 4 times => {2,4}

repeated 2 to infinity => {2,}

then

separates two parts of the statement, adds a closing grouper if necessary

any character except ‘c’ then the character ‘d’ => [^c]d

and

same as then, but does not close groupers

any character except ‘c’ and the character ‘d’ => [^cd]

non-greedy

makes the preceding item non-greedy. Regular expressions are by default greedy, meaning they will try to consume as much text as possible

any character zero or more times => .* => first match in ‘ssss’ will be ‘ssss’

any character zero or more times non-greedy => .*? => first match in ‘ssss’ will be ‘s’

optional

the preceding item is optional

the string ‘colo’  and the character ‘u’  optional and the character ‘r’ => colou?r => matches both color and colour

backreference (1.. 9)

refers to the captures # 1 through 9

capture any character then the character ‘=’ then backreference 1 => (.)=\1 => matches 1=1, 2=2, but not 4=5

zero or more times,  zero or one times, one or more times

shorthand for the repeated keyword (only these three versions work). Result in more concise regular expressions

any character zero or more times => .*

 

Example: a regular expression to identify email addresses, and capture the username and domain

 

Input:

capture any character except '@' and any whitespace all one or more times then '@' then capture the value of any character from 'a' - 'z' and '0' - '9' and the character '-' all one or more times then the character '.' all one or more times then any character from 'a' - 'z' all repeated '2' to infinity times

 

Output:

/([^@\s]+)@((?:[a-z0-9-]+)\.)+)[a-z]{2,}/

 

Screenshots

 

 

Changelog

 

V0.3.1 – Embed Macruby Framework in Application

v0.3.2 – fixed Capitalization issues

 

 

© Copyright 2011 - Vishnu Menon. All Rights Reserved