Ruby Regexp Machine
(RRM) is a free mac application, written in MacRuby, which allows users to
write sentences in plain English and convert them into valid ruby regular
expressions. Currently, it supports many basic parts of Regexp syntax, but is
far from complete. RRM is useful to
people learning how to use regular expressions, and can help show the
connection between the regexp syntax and its meaning. When using RRM, it is
helpful, though not necessary, to have a basic/minimal
knowledge of regular expressions beforehand, as some outputs could be
incorrect. NEVER rely solely on RRM to generate regular expressions; always
verify that any regexps generated in RRM work as intended. Rubular is a great
place to test regular expressions. Regular Expressions generated in RRM are
largely compatible with many other languages, such as Perl. Ruby Regexp Machine
is licensed under GPL.
Ruby Regexp Machine uses
a modified English grammar for input, which is largely similar to Standard
English and is extremely easy to understand and use. However, there are some
rules that must be followed, and keywords that must be used, to ensure proper
output. Using any word that is not a literal string, a keyword, or a number
will result in an error message. Please read over these rules carefully before
using RRM.
á No punctuation except those defined as keywords below
á No double quotes/quotation marks; all literal strings and characters must be surrounded by Ô Ô (single quotes or apostrophes)
á No keyboard shortcuts (right click to copy or paste)
á Capitalization does not matter
Meaning |
Example |
|
Any (word, digit, whitespace, string) |
Matches any word, digit, etc.É |
any string => .* any digit => \d |
any character (except, from) |
creates character classes. Can use a rang of
characters, or characters to be excluded, or nothing |
any character => . any character except ÔaÕ and ÔcÕ => [^ac] any character from ÔaÕ – ÔzÕ => [a-z] |
- (hyphen) |
the hyphen denotes a range, and is to be used only
in character classes (see above) |
any character from ÔAÕ – ÔZÕ => [A-Z] |
a (space, line start, line end, string start,
string end) |
A single space, line start, etc.É |
a space => ( ) a line start => ^ |
the (word, character, string) |
adds the word, character, or string that follows to
the regular expression. Optional |
the string Ôhello worldÕ => hello world |
the value of |
group keywords, without capturing contents or making
a backreference |
the value of É.. => (?: É.. ) |
followed by |
followed by the word ÔfooÕ => (?=foo) |
|
followed by not |
look ahead for the following to not be present, but
do not capture |
followed by not the word ÔfooÕ => (?!foo) |
preceded by |
look behind for the following, but do not capture |
preceded by the word ÔfooÕ => (?<=foo) |
preceded by not |
look behind for the following to not be present,
but do not capture |
preceded by not the word ÔfooÕ => (?<!foo) |
turn (off, on) (case insensitive, whitespace
insensitive, any character includes newline) |
toggle the three options for the remainder of the
regex, or until otherwise indicated |
turn off case insensitive => (?-i) turn on case insensitive => (?i) |
capture |
groups keywords, captures result in regular
expression, creates a backreference |
capture É. => (É.) |
eitherÉ. or |
two options, surrounded by parenthesis |
either foo
or bar => (foo|bar) |
all |
group everything between the last grouping keyword
(ex. Ôthe value ofÕ, ÔcaptureÕ ) and the current
location |
the value of É.. all => (É) |
repeated .. to .. times |
the preceding item repeated a range of times. Infinity is used for no limit on either
the upper or lower range. Use to instead of – (hyphen) for the
ÔrepeatedÕ keyword |
repeated 2 to 4 times => {2,4} repeated 2 to infinity => {2,} |
then |
separates two parts of the statement, adds a closing
grouper if necessary |
|
and |
same as then, but does not close groupers |
any character except ÔcÕ and the character ÔdÕ
=> [^cd] |
non-greedy |
makes the preceding item non-greedy. Regular expressions
are by default greedy, meaning they will try to consume as much text as
possible |
any character zero or more times => .* =>
first match in ÔssssÕ will be ÔssssÕ any character zero or more times non-greedy =>
.*? => first match in ÔssssÕ will be ÔsÕ |
optional |
the preceding item is optional |
the string ÔcoloÕ and the character ÔuÕ optional and the character ÔrÕ =>
colou?r => matches both color and colour |
backreference (1.. 9) |
refers to the captures # 1 through 9 |
capture any character then the character Ô=Õ then
backreference 1 => (.)=\1 => matches 1=1, 2=2, but not 4=5 |
zero or more times, zero or one times, one or more times |
shorthand for the repeated keyword (only these three
versions work). Result in more concise regular expressions |
any character zero or more times => .* |
Input:
capture any character except '@' and any whitespace all one or more times then '@' then capture the value of any character from 'a' - 'z' and '0' - '9' and the character '-' all one or more times then the character '.' all one or more times then any character from 'a' - 'z' all repeated '2' to infinity times
Output:
/([^@\s]+)@((?:[a-z0-9-]+)\.)+)[a-z]{2,}/
V0.3.1 – Embed Macruby Framework in Application
v0.3.2 – fixed Capitalization issues
© Copyright 2011 -
Vishnu Menon. All Rights Reserved