Ruby Regexp Machine (v0.3.2b)

By Vishnu Menon

Download: http://sourceforge.net/projects/rubyregexp/files

Please Donate: http://sourceforge.net/project/project_donations.php?group_id=556621

Support: http://sourceforge.net/projects/rubyregexp/support

Project Summary/Join: http://sourceforge.net/projects/rubyregexp/

Introduction

Ruby Regexp Machine (RRM) is a free mac application, written in MacRuby, which allows users to write sentences in plain English and convert them into valid ruby regular expressions. Currently, it supports many basic parts of Regexp syntax, but is far from complete. RRM is useful to people learning how to use regular expressions, and can help show the connection between the regexp syntax and its meaning. When using RRM, it is helpful, though not necessary, to have a basic/minimal knowledge of regular expressions beforehand, as some outputs could be incorrect. NEVER rely solely on RRM to generate regular expressions; always verify that any regexps generated in RRM work as intended. Rubular is a great place to test regular expressions. Regular Expressions generated in RRM are largely compatible with many other languages, such as Perl. Ruby Regexp Machine is licensed under GPL.

Use/Instructions

Ruby Regexp Machine uses a modified English grammar for input, which is largely similar to Standard English and is extremely easy to understand and use. However, there are some rules that must be followed, and keywords that must be used, to ensure proper output. Using any word that is not a literal string, a keyword, or a number will result in an error message. Please read over these rules carefully before using RRM.

Usage Notes:

· No punctuation except those defined as keywords below

· No double quotes/quotation marks; all literal strings and characters must be surrounded by ‘ ‘ (single quotes or apostrophes)

· No keyboard shortcuts (right click to copy or paste)

· Capitalization does not matter

Recognized Keyword Combinations (in no particular order):

Keyword	Meaning	Example
Any (word, digit, whitespace, string)	Matches any word, digit, etc.…	any string => .* any digit => \d
any character (except, from)	creates character classes. Can use a rang of characters, or characters to be excluded, or nothing	any character => . any character except ‘a’ and ‘c’ => [^ac] any character from ‘a’ – ‘z’ => [a-z]
- (hyphen)	the hyphen denotes a range, and is to be used only in character classes (see above)	any character from ‘A’ – ‘Z’ => [A-Z]
a (space, line start, line end, string start, string end)	A single space, line start, etc.…	a space => ( ) a line start => ^
the (word, character, string)	adds the word, character, or string that follows to the regular expression. Optional	the string ‘hello world’ => hello world
the value of	group keywords, without capturing contents or making a backreference	the value of ….. => (?: ….. )
followed by	look ahead for the following, but do not capture	followed by the word ‘foo’ => (?=foo)
followed by not	look ahead for the following to not be present, but do not capture	followed by not the word ‘foo’ => (?!foo)
preceded by	look behind for the following, but do not capture	preceded by the word ‘foo’ => (?<=foo)
preceded by not	look behind for the following to not be present, but do not capture	preceded by not the word ‘foo’ => (?<!foo)
turn (off, on) (case insensitive, whitespace insensitive, any character includes newline)	toggle the three options for the remainder of the regex, or until otherwise indicated	turn off case insensitive => (?-i) turn on case insensitive => (?i)
capture	groups keywords, captures result in regular expression, creates a backreference	capture …. => (….)
either…. or	two options, surrounded by parenthesis	either foo or bar => (foo\|bar)
all	group everything between the last grouping keyword (ex. ‘the value of’, ‘capture’ ) and the current location	the value of ….. all => (…)
repeated .. to .. times	the preceding item repeated a range of times. Infinity is used for no limit on either the upper or lower range. Use to instead of – (hyphen) for the ‘repeated’ keyword	repeated 2 to 4 times => {2,4} repeated 2 to infinity => {2,}
then	separates two parts of the statement, adds a closing grouper if necessary	any character except ‘c’ then the character ‘d’ => [^c]d
and	same as then, but does not close groupers	any character except ‘c’ and the character ‘d’ => [^cd]
non-greedy	makes the preceding item non-greedy. Regular expressions are by default greedy, meaning they will try to consume as much text as possible	any character zero or more times => .* => first match in ‘ssss’ will be ‘ssss’ any character zero or more times non-greedy => .*? => first match in ‘ssss’ will be ‘s’
optional	the preceding item is optional	the string ‘colo’ and the character ‘u’ optional and the character ‘r’ => colou?r => matches both color and colour
backreference (1.. 9)	refers to the captures # 1 through 9	capture any character then the character ‘=’ then backreference 1 => (.)=\1 => matches 1=1, 2=2, but not 4=5
zero or more times, zero or one times, one or more times	shorthand for the repeated keyword (only these three versions work). Result in more concise regular expressions	any character zero or more times => .*

Example: a regular expression to identify email addresses, and capture the username and domain

Input:

capture any character except '@' and any whitespace all one or more times then '@' then capture the value of any character from 'a' - 'z' and '0' - '9' and the character '-' all one or more times then the character '.' all one or more times then any character from 'a' - 'z' all repeated '2' to infinity times

Output:

/([^@\s]+)@((?:[a-z0-9-]+)\.)+)[a-z]{2,}/

Screenshots

Changelog

V0.3.1 – Embed Macruby Framework in Application

v0.3.2 – fixed Capitalization issues