20CN网络安全小组论坛 - 编程破解 - 在一个很长的字符串里对一个字串进行搜索并进行替换/删除

论坛: 编程破解标题: 在一个很长的字符串里对一个字串进行搜索并进行替换/删除

作者: kert_t8 [kert_t8]

论坛用户

CSDN看到的，本来不是个大事情，可悲的是我连他们所说的
Boyer-Moore，KMP Skip Search，Backward Oracle Matching
这些算法，我以前连名字都没听到过。
据说这儿有相关资料 http://www-igm.univ-mlv.fr/~lecroq/string/
找个时间去看一下。先发一贴备忘

欢迎明白人解释给我们听

地主发表时间: 05-01-14 14:15

回复: kert_t8 [kert_t8]

论坛用户

阿弥陀佛，看了这么久，也没看出个所以然。所以决定从头看，看一点写一点，管他妈的懂没懂，先翻译出来先

首先是字符串匹配应用的领域，就我可以理解和想到的有生物工程，比如说DNA匹配之类的，还有数据库查询，搜索引擎之类的。这些应用所要处理的都是海量数据，因此即使在计算机运算速度高度发展，存储容量也比以前有极大的增加，字符串匹配算法的时空复杂度也仍然是非常重要。同时，我个人认为凡是跟算法有关的研究，了解得越多，对设计别的程序（字符串相关或无关）都有好处。

问题描述：
字符串匹配问题就是要在一段文本（text）当中找出一个特定字符串/模板(String / Pattern)的每一次出现。在以后的跟帖当中，将/请采用以下的数据结构。输入文本:X及其长度m+1（最后一位是特殊结束符，下同），输入字符串/模板:Y及其长度n+1，文本与字符串都由来自有限字符集Sig的字符组成，字符集大小为t。

问题分类：（分为两类，为啥？不晓得。看下面）

引用:

Applications require two kinds of solution depending on which string, the pattern or the text, is given first. Algorithms based on the use of automata or combinatorial properties of strings are commonly implemented to preprocess the pattern and solve the first kind of problem. The notion of indexes realized by trees or automata is used in the second kind of solutions. This book will only investigate algorithms of the first kind.

以上一段话没有读懂，其实懂是可以懂，只是无法理解。

基本思路：
最基本的方法是，通过一个长度为m（字符串长度）的窗口进行检索。可以从左往右，也可以从有往左或是别的顺序（还没看出区别，不过书上说有区别）。一开始的时候对齐，然后将透过窗口的字符串与模板做比较。然后移动一位，接着比较。go on and on~~~ :)每次比较我们成为一次尝试（attemp）

好了，序言的东西先说这么多，好多东西我也没看明白，也无从讲起
下面先介绍暴力破解法(Brute Force)

主要特征：
无前续处理
所需额外空间为常量
窗口每次右移一位
比较顺序随意（左到右，右到左什么的都可以）
时间复杂度为 O(mn)
2n expected text characters comparisons. (不理解)

算法描述：
暴力破解法使用的是最基本的方法，依次比较。每一次尝试，从左到右依次比较每一个字符。

源代码（C）：

代码:


void BF(char *x, int m, char *y, int n) {
	int i, j;

	/* Searching */
	for (j = 0; j <= n - m; ++j) {
		for (i = 0; i < m && x[i] == y[i + j]; ++i);
			if (i >= m)
			OUTPUT(j);
	/* 
	 * output为自定义函数，j所保存的是一个索引，但是输出的可能是由j计算所得到的某个值
	 * 或者是由j和原来的文本得到的某一个值. -- 我是这样认为的
	 *
	 */
	}
}

改进以后的源代码（C）:

代码:


#define EOS '\0'
   
void BF(char *x, int m, char *y, int n) { 
	char *yb; 

	/* Searching */ 
	for (yb = y; *y != EOS; ++y) 
		if (memcmp(x, y, m) == 0) 
	OUTPUT(y - yb);
}

PS:这两段代码是把我看服了的

实例（过程分析）：
--------------------------------------------------------------------------------
First attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1 2 3 4
G C A G A G A G
Second attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Third attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Fourth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Fifth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Sixth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1 2 3 4 5 6 7 8
G C A G A G A G
Seventh attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Eighth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Ninth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1 2
G C A G A G A G
Tenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Eleventh attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1 2
G C A G A G A G
Twelfth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Thirteenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1 2
G C A G A G A G
Fourteenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Fifteenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Sixteenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G
Seventeenth attempt
G C A T C G C A G A G A G T A T A C A G T A C G
1
G C A G A G A G

在本例中，一共执行了30次字符比较

不行了不行了，容我喘口气，嗨~嗨~嗨~嗨。
这次介绍的少，而且是基本的，后面的一个有很长，而且好多东西我都还是不很明白，所以等一下罗哦

----------------------------------------------------------------------------------------
To Be Continued... 争取多学几个算法，虽然学不完

[此贴被月之御者(kert_t8) 在 01月18日10时09分编辑过]

B1层发表时间: 05-01-18 10:05

回复: ljsh012 [ljsh012]

论坛用户

看过一遍算法，感觉还是挺不一般的，不过还没有完全理解，呆会儿再细细来看。（后面输出y-yb那里不成相距几个字符了，但是跟目的我还没联系起来）
看来我得加快英语的学习了，不然老看翻译出来的东西永远达不到看原版的效果。
加油，兄弟！

B2层发表时间: 05-01-18 13:15

回复: SysHu0teR [syshunter]

版主

UP

我也要看，下学期我考试可能就要考这个。

B3层发表时间: 05-01-21 22:52

回复: SysHu0teR [syshunter]

版主

能不能再说的详细点。把你下面实例部分的完整代码贴上来给我看看

B4层发表时间: 05-01-22 11:21

回复: SysHu0teR [syshunter]

版主

#include <stdio.h>
#include <string.h>
//在长度为n的x字串中查找长度为m的y串出现的次数
int search(char *x,int n,char *y,int m) {
int k=0;
for(int i=0;i<n;i++) {
for(int j=0;j<m&&x[i+j]==y[j];j++);
if(j==m) k++;
}
return k;
}

int main(void) {
char str1[]="aasdssfaaskdfaa";
char str2[]="as";
printf("%d\n",search(str1,strlen(str1),str2,2));
return 1;
}
是不是这个意思？你那代码我看不大明白。

B5层发表时间: 05-01-22 11:26

回复: SysHu0teR [syshunter]

版主

晕，明白了。不是一回事

B6层发表时间: 05-01-22 11:38

回复: taojuntjpp [taojuntjpp]

论坛用户

我的数据结构书上有这个算法
我回去翻翻看啊
然后回来跟你们说哈
呵呵~~~
有个KMP算法
好象一个是无回溯还有一个是有回溯的
先研究研究哦~~~

B7层发表时间: 05-01-22 12:35

回复: kert_t8 [kert_t8]

论坛用户

SysHu0teR不好意思，这两天忙作业，没有时间研究这个，先忍一忍吧，我一有时间马上弄

B8层发表时间: 05-01-22 15:31

论坛: 编程破解