【讨论】正则系列

qibbxxt · 发表于 2013-6-15 23:25:06

本帖最后由 qibbxxt 于 2013-6-16 00:06 编辑

元音第一集
题目如下：
找出字符串中元音字母出现的次数
x='string the MaTLaBiAn'
在这个字符串中，元音字母依次为iaaBiA,一共出现了6次
x='coUnt the vowEl'
在这个字符串中，元音字母依次为oUoEI,一共出现了5次
写一个函数来完成，输入字符串，输出出现元音字符次数
可以用下列例子来测试

1
%%
x='coUnt the vowEl';
y_correct = 5;
assert(isequal(vowel_counter(x),y_correct))
2
%%
x='coUnt the vowEl counter';
y_correct = 8;
assert(isequal(vowel_counter(x),y_correct))
3
%%
x='The fox was the jackle';
y_correct = 6;
assert(isequal(vowel_counter(x),y_correct))
4
%%
x='Education';
y_correct = 5;
assert(isequal(vowel_counter(x),y_correct))
5
%%
x='We are the MaTLaBiAns';
y_correct = 8;
assert(isequal(vowel_counter(x),y_correct))

复制代码

题目来源：http://www.mathworks.cn/matlabcentral/cody/problems/1559-count-vowel

bainhome · 发表于 2013-6-15 23:30:24

本帖最后由 bainhome 于 2013-6-15 23:46 编辑

先来个非正则的，把楼歪掉，其他同仁再用正则给“正”回来：

function ans = vowel_counter(x)
lower(x);
sum(ans==97|ans==101|ans==105|ans==111|ans==117);
end

复制代码

祁工提供了一个相同思路但更高效的代码：

nnz(bsxfun(@eq,lower(x),['aeiou']'))

复制代码

lin2009 · 发表于 2013-6-16 00:19:30

sum(arrayfun(@(x0) numel(x0),regexp(x,'[aeoiu]')))

复制代码

nwcwww · 发表于 2013-6-16 04:56:09

本帖最后由 nwcwww 于 2013-6-16 05:37 编辑

function ans = vowel_counter(x)
numel(regexpi(x, '[aeiou]'));
end

复制代码

再小就要用dynamic了吧。
另：lin兄的答案好像没有考虑大小写？

nwcwww · 发表于 2013-6-16 05:52:49

补充一个用dynamic regular expression的解：

function y = vowel_counter(x)
regexp('','(?@ y=numel(regexpi(x, ''[aeiou]''));)');
end

复制代码

qibbxxt · 发表于 2013-6-16 08:26:55

来了非正则的

nnz(ismember(lower(x),'aeiou'))

复制代码

liuyalong008 · 发表于 2013-6-16 13:52:30

曲线救国一把

numel(x)-numel(regexprep(lower(x),'[aeiou]',''))

复制代码

bainhome · 发表于 2013-6-16 20:39:47

本帖最后由 bainhome 于 2013-6-17 02:01 编辑

让正则再飞一会儿：
cody练习原题见：Pro.129:All Capticals?

Are all the letters in the input string capital letters?
Examples:
'MNOP' -> 1
'MN0P' -> 0

复制代码

测试例子如下：

1 %%
x = 'MNOP';
y_correct = 1;
assert(isequal(your_fcn_name(x),y_correct))
2 %%
x = 'MN0P';
y_correct = 0;
assert(isequal(your_fcn_name(x),y_correct))
3 %%
x = 'INOUT1NOUT';
y_correct = 0;
assert(isequal(your_fcn_name(x),y_correct))
4 %%
x = 'UPANDDOWN';
y_correct = 1;
assert(isequal(your_fcn_name(x),y_correct))
5 %%
x = 'RUaMATLABPRO';
y_correct = 0;
assert(isequal(your_fcn_name(x),y_correct))

复制代码

但是在评论中有个建议不错：

Des Mc Manuson 6 Dec 2012：
the description of the problem is misspecified. "Are all the letters in the input string capital letters?" Numbers are not letters. The tests treat numbers as lower case.

复制代码

可以做做。

正则类，至少意图正则的题目最大特点是灵活和多解，这个系列自然要多弄几个，等告一段落我们再总结讨论吧。

qibbxxt · 发表于 2013-6-16 20:49:42

本帖最后由 qibbxxt 于 2013-6-16 20:51 编辑

bainhome 发表于 2013-6-16 20:39
让正则再飞一会儿：
cody练习原题见：Pro.129:All Capticals?测试例子如下：但是在评论中有个建议不错：可 ...

function y = your_fcn_name(x)
y = length(regexp(x,'[A-Z]{1}')) == length(x);
end

复制代码

大致解释一下：逐个匹配大写字母，每次匹配一个，这样计算出大写字母的个数和字符串的个数如果相等，则全是大写字符，否则，不是

nwcwww · 发表于 2013-6-16 20:56:08

function y = your_fcn_name(x)
regexp('','(?@ y = all(x-lower(x)));)');
end

复制代码

这个其实不是正则了。。

liuyalong008 · 发表于 2013-6-16 21:12:47

本帖最后由 liuyalong008 于 2013-6-16 21:22 编辑

bainhome 发表于 2013-6-16 20:39
让正则再飞一会儿：
cody练习原题见：Pro.129:All Capticals?测试例子如下：但是在评论中有个建议不错：可 ...

numel(regexp(x,'[^A-Z]'))==0

复制代码

isempty(regexprep(x,'[A-Z]',''))

复制代码

liuyalong008 · 发表于 2013-6-17 11:54:07

本帖最后由 liuyalong008 于 2013-6-17 11:55 编辑

题目三：找出电话区号
Problem 91. Get the area codes from a list of phone numbers
问题描述：

Given a string of text with phone numbers in it, return a unique'd cell array of strings that are the area codes.
s = '508-647-7000, (508) 647-7001, 617-555-1212';
then
a = {'508','617'}

复制代码

% 验证：
%%
s = '508-647-7000, (508) 647-7001, 617-555-1212, 1-800-323-1234, 704 555-1212';
a = {'508','617','704','800'};
assert(isequal(refcn(s),a))
%%
s = '212-657-0260; (888) 647-7001; 336 565-1212; +1-800-323-1234';
a = {'212','336','800','888'};
assert(isequal(refcn(s),a))

复制代码

lin2009 · 发表于 2013-6-17 17:21:18

areacode = regexp(s,'\(?(\d{3})\)?(?=[ -]\d{3}\-\d{3,4})','tokens');
areacode_unique = unique([areacode{:}])

复制代码

nwcwww · 发表于 2013-6-17 18:19:16

function y = refcn(s)
'(\d{3})(?=\D+\d{3}-\d{4})';
regexp('','(?@ y=unique(regexp(s, ans, ''match''));)');
end

复制代码

qibbxxt · 发表于 2013-6-17 20:45:07

unique(feval(@(x)x(1:3:end),regexp(s,'\d{3}','match')));

复制代码

qibbxxt · 发表于 2013-6-19 18:38:54

原题目见Count letters occurence in text, specific to words with a given length.

Build a function with two input arguments: a string and a word length (number of letters), that outputs a vector of counts of the 26 letters of the alphabet, specific to words with a given length.
Case insensitive.
Words contain only letters a-zA-Z, but the string can contain punctuation.
Example
>> txt = 'Hello World, from MATLAB' ;
>> nl = 5 ; % Number of letters.
>> nlWords_getCounts(txt, nl)
ans =
0 0 0 1 1 0 0 1 0 0 0 3 0 0 2 0 0 1 0 0 0 0 1 0 0 0
here, two 5 letters words are found: 'Hello' and 'World'. The output vector is the count of letters (1 to 26) in these two words taken together. For example, letter 12 is 'l/L' and we see that it appears 3 times, hence the count of 3.

复制代码

题目的大致意思是：找出指定长度的字符串，统计其中26个字母的频率

测试函数如下：

1
%%
txt = 'Hello World, from MATLAB' ;
nl = 5 ;
counts_correct = [0 0 0 1 1 0 0 1 0 0 0 3 0 0 2 0 0 1 0 0 0 0 1 0 0 0];
assert(isequal(nlWords_getCounts(txt, nl),counts_correct))
2
%%
txt = 'UPPER converts any lowercase characters in the string str to the corresponding uppercase characters and leaves all other characters unchanged.'
nl = 9 ;
counts_correct = [3 0 3 1 5 0 1 1 0 0 0 1 0 2 1 2 0 2 2 0 2 0 1 0 0 0];
assert(isequal(nlWords_getCounts(txt, nl),counts_correct))
3
%%
txt = 'UPPER converts any lowercase characters in the string str to the corresponding uppercase characters and leaves all other characters unchanged.'
nl = 10 ;
counts_correct = [6 0 6 0 3 0 0 3 0 0 0 0 0 0 0 0 0 6 3 3 0 0 0 0 0 0];
assert(isequal(nlWords_getCounts(txt, nl),counts_correct))

复制代码

nwcwww · 发表于 2013-6-19 23:03:05

本帖最后由 nwcwww 于 2013-6-20 11:46 编辑

dynamic无节操。

function y = nlWords_getCounts(txt, nl)
regexp(lower(txt), '\<\w+\>','match');
regexp('', '(?@ y=arrayfun(@(x) length(regexp(strcat(ans{cellfun(@numel, ans)==nl}), char(96+x))), 1:26);)')
end

复制代码

--------------------

祁工在18楼的点评甚是。我在楼下大概描述下思路吧。
另：想起来当时为何在这里没有用\w{5}一步到位了，因为nl是可变的。。建议大家借鉴祁工sprintf的实现方式。

nwcwww · 发表于 2013-6-20 01:37:50

本帖最后由 nwcwww 于 2013-6-20 01:39 编辑

多用一次dynamic替换,size从楼上的20减到17：

function y = nlWords_getCounts(txt, nl)
regexp('','(?@ z=strsplit(lower(txt), {'' '', '','', ''.''}))');
regexp('', '(?@ y=arrayfun(@(x) length(regexp(strcat(z{cellfun(@numel, z)==nl}), char(96+x))), 1:26);)')
end

复制代码

nwcwww · 发表于 2013-6-20 04:37:00

晚饭过后来个完全不可读的。size = 12，但是我自己都看不懂写的是啥了。

function y = nlWords_getCounts(txt, nl)
regexp('', '(?@ y=arrayfun(@(x) length(subsref(regexp(cellstr(cell2mat(subsref(strsplit(lower(txt), {'' '', '','', ''.''}),struct(''type'',''()'',''subs'',{{cellfun(@length, strsplit(lower(txt), {'' '', '','', ''.''}))==nl}})))), char(96+x)),struct(''type'',''{}'',''subs'',{{1}}))), 1:26);)')
end

复制代码

bainhome · 发表于 2013-6-20 10:03:11

nwcwww功力火候相当老道啊，赞一个！
你自己看着费劲，我们就更云里雾里了，呵呵。可以给些提示，增加一定讨论的操作空间。

账号		自动登录	找回密码
密码			注册

账号		自动登录	找回密码
密码			立即注册

【讨论】正则系列

点评

点评

题目2——All Capticals？

点评

评分

点评

评分

评分

题目四： Count letters

点评

点评

点评

评分