无营养两则

1. 最近在补 3D 知识,上周花很大的精力试着手动逐像素渲染 3D 图形,挺好玩的:

老大的要求是高光贴图和法线贴图也是要的,我也做出来了,只不过几个小时之前突然意识到一个非常重要的坐标转换没有做,所以那个法线贴图的结果其实很不靠谱,代码先不放了,省得丢人,等把法线贴图做对了再说吧。

这是阉割版的程序 —— 如果你有兴趣围观的话:rending.7z

哦对了,这里用到的数学运算俺也都自己实现了一遍,等有时间了会整理一下的,毕竟这个其实挺好玩。

 

 

2. 觉得 mataball 这东西挺有意思,结果就整出了个奇丑的:

任何试图画出有两个眼睛一个嘴巴的面部的尝试都会得到一个青蛙

程序和代码在这儿:metaball.7z

圈圈们默认是会动的,如果你只是想画青蛙,把 render 函数的第二个参数设为 0 就好了。

BTW Fast inverse square root 真是威武,有多威武你自己试试就知道了。

 

用 hough 变换检测抛物线

告诉你某张图中很可能有某个东西长得很像开口向上的抛物线,想把它检测出来,嘿嘿,看上去很难吧?

其实呢,这种事情很适合用 hough 变换 来做:

假设抛物线方程为 y = a ( x - b )2 + c

先把图像边缘检测出来,对于图像边缘上的每一个点 (x,y), 可以有很多条抛物线穿过它,即有很多个 a, b, c 可以使上面的方程成立,于是, 你很暴力的把这些 a, b, c 都记下来! ,对于每个 x,y 点都记下一次,然后看看哪些 a, b, c 的组合被命中的次数比较多就好了。

当然,上面的那个过程有不少地方是可以优化的,比如说对于图片,b 和 c 肯定是整数值,而且 b 很可能在比较偏中心的位置,而 a 的取值范围可以预先设定,比如说如果你想检测下巴,保守的说 a 肯定就是小于 1 的(不然那下巴也太尖了吧!),如此这般等等——而 OpenCV 中用 hough 变换检查圆的代码中显然还有些其他的不那么容易理解的优化,反正我也用不着把它做得很高效,就没有研究了。

下面上图:

(图片均来源于 pixdaus.com ,如果碰巧它侵犯了您的版权,请与我联系)

 

再上代码——

 

(你确定要看吗??!!)

 

 

(警告!魔数出没!)

 

 

 

(警告!没有注释!)

 

 

 

 

(警告!诡异的变量名出没!)

 

 

 

 

 

(警告!你的大脑可能因为看到这种乱七八糟的代码受到永久性的损伤!!)

 

 

 

 

 

 

(你确定依然要看吗??!!!!)

 

 

 

 

 

 

 

// y = a/150*(x-b)^2+c;
struct Parabola {
    int a;
    int b;
    int c;
    Parabola():a(0),b(0),c(0){}
    Parabola(int a, int b, int c):a(a),b(b),c(c){}
    Parabola(Parabola const& p):a(p.a),b(p.b),c(p.c){}
};
struct Record {
    Parabola para;
    int      count;
    Record():para(),count(0){}
    Record(int a, int b, int c, int cnt):para(a,b,c),count(cnt){}
};
struct RecordCmp {
    inline bool operator()(Record const& a, Record const& b) {
        return a.count>b.count;
    }
};
std::vector<Parabola> detectParabolas( cv::Mat const& img,
                                       int minA=-1,
                                       int maxA=-1,
                                       int minB=-1,
                                       int maxB=-1,
                                       int minC=-1,
                                       int maxC=-1 )
{
    cv::Mat edges;
    // cv::imshow("debug", img);
    // cv::waitKey(3000);
    cv::Canny( img, edges, 50, 150 );
    cv::Mat_<uint8_t> img_(edges);
    // cv::imshow("debug", edges);
    // cv::waitKey(3000);
    int ***acc = 0;

    if(minA < 0)
        minA = 1;
    if(minB < 0)
        minB = img.cols/4;
    if(minC < 0)
        minC = img.rows/4;
    if(maxA < 0)
        maxA = 60;
    if(maxB < 0)
        maxB = img.cols/2+img.cols/4;
    if(maxC < 0)
        maxC = img.rows;

    acc = new int**[maxA+1];
    for(int i=0; i<maxA+1; ++i) {
        acc[i] = new int*[maxB+1];
        for(int j=0; j<maxB+1; ++j) {
            acc[i][j] = new int[maxC+1];
            for(int k=minC; k<maxC+1; ++k) {
                acc[i][j][k]=0;
            }
        }
    }

    printf("calculating ...\n");
    int percent=0;
    for(int y=0; y<img.rows; ++y) {
        for(int x=0; x<img.cols; ++x) {
            if(img_(y,x)) {
                for(int a=minA; a<=maxA; ++a){
                    for(int b=minB; b<maxB; ++b) {
                        int c=y+a*(x-b)*(x-b)/150;
                        if(c>=minC && c<=maxC) {
                            ++acc[a][b][c];
                        }
                    }
                }
            }
        }
        int p=int(round(double(y)/img.rows*100.));
        if(p!=percent) {
            percent = p;
            printf("> %02d%%\n", percent);
        }
    }
    printf("done!\n");

    sizedpq<Record, RecordCmp> pq(3);
    for(int i=minA; i<=maxA; ++i) {
        for(int j=minB; j<=maxB; ++j) {
            for(int k=minC; k<=maxC; ++k) {
                pq.insert(Record(-i,j,k, acc[i][j][k]));
            }
        }
    }

    for(int i=0; i<=maxA; ++i) {
        for(int j=0; j<=maxB; ++j) {
            delete[] acc[i][j];
        }
        delete[] acc[i];
    }
    delete[] acc;

    std::vector<Parabola> res;
    for(int i=0; i<3; ++i) {
        res.push_back(pq[i].para);
    }
    return res;
}

上面那段代码最诡异而需要说明的地方是 a ,我希望检测的是开口向上的抛物线,于是在计算机图像的坐标系统下 a 的取值应该是负数,但作为数组下标时它当然得要是正数,所以这里多绕了几个弯,下次我自己看到肯定也会忘记的,不过还好这算法相当简单,下次需要用到我肯定更愿意重写。

sizedpq 类是俺以前做马赛克拼接图像时折腾出来的一个固定大小的优先队列类,其实设计得并不好,不过接口挺好用,所以.. 就拿来用了。

 

PS. 速度与图像边缘数量关系很密切,不过通常情况下处理一张上面展示的那种大小的图像也就需要 3 秒钟左右吧,其实还可以接受。

PPS. 人果然还是逼出来的啊,今早老宋很客气的问我节日快乐,弄得我很不好意思的在两个小时之内把这个做好了……

PPPS. 上周末和室友去西湖边逛了一圈,在柳浪闻莺景区里,我刚拍完一张照,发现旁边飘过了一个长发白衣飘逸出尘的美女背影,我说哇美女,他说其实不是美女我已经观察很久了...... 唉,男人。

PPPPS. 谁能教我玩 DotA 啊!!!

流水帐+意识流 [2]

1. C++ 很变态

去网易面试了,一个 C++ 上的问题把我难到了:“多重继承的时候子类是不是要保存每个父类的虚函数表?”

我回答 “通常是的,但虚继承的时候好像 %#@$%#^&*” —— 其实我就记得虚继承的时候情况很有点乱,忘了究竟有多乱。

回来之后做了个实验:

#include <iostream>
#include <iomanip>
#ifdef _MSC_VER // 老兄,特立独行的永远是你!!
typedef unsigned int uint32_t;
#else
#include <stdint.h>
#endif

using namespace std;

class A
{
public:
    uint32_t x;
    uint32_t y;
    A():x(42),y(24){}
    virtual void f(){
        cout<<"A::f()    [this="<<this
            <<"]; this->x="<<x<<"; this->y="<<y<<";\n";
    }
};

class B:virtual public A
{
public:
    B():A(){}
};

class C:virtual public A
{
public:
    C():A(){x=21,y=12;}
    virtual void f(){
        cout<<"C::f()    [this="<<this
            <<"]; this->x="<<x<<"; this->y="<<y<<";\n";
    }
};

class D:public B, public C
{ 
public:
    D():A(),B(),C(){}
};

template<class T>
void showDetail(T const& v)
{
    int s=sizeof(T)/4;
    uint32_t const* p=reinterpret_cast<uint32_t const*>(&v);
    cout<<"  +-------------+\n";
    for(int i=0;i<s;++i) {
        cout<<"  | "<<setw(11)<<p[i]<<" | ";
        if(p[i]>10000) { // suppose it's a pointer
            cout<<" -> "<<*reinterpret_cast<uint32_t const*>(p[i]);
        }
        cout<<endl;
    }
    cout<<"  +-------------+\n";
}

int main(){
    A a;
    B b;
    C c;
    D d;
    cout<<"a:\n";
    showDetail(a);
    cout<<"b:\n";
    showDetail(b);
    cout<<"c:\n";
    showDetail(c);
    cout<<"d:\n";
    showDetail(d);
    return 0;
}

为的是看看虚继承的时候到底发生了什么,结果又是大出意外,GCC, M$VC, DigitalMars 编译出来的竟各不相同:

GCC

a:  +-------------+
    |     4666904 | -> 4324460
    |          42 |
    |          24 |
    +-------------+

b:  +-------------+
    |     4666924 | -> 0
    |     4666936 | -> 4324460
    |          42 |
    |          24 |
    +-------------+

c:  +-------------+
    |     4666956 | -> 4324764
    |     4666972 | -> 4631892
    |          21 |
    |          12 |
    +-------------+

d:  +-------------+
    |     4666988 | -> 4
    |     4667000 | -> 4324764
    |     4667016 | -> 4631892
    |          21 |
    |          12 |
    +-------------+

M$VC

a:  +-------------+
    |     4305572 | -> 4198672
    |          42 |
    |          24 |
    +-------------+

b:  +-------------+
    |     4305668 | -> 0
    |     4305664 | -> 4198672
    |          42 |
    |          24 |
    +-------------+

c:  +-------------+
    |     4305632 | -> 0
    |           0 |
    |     4305628 | -> 4217088
    |          21 |
    |          12 |
    +-------------+

d:  +-------------+
    |     4305684 | -> 0
    |     4305632 | -> 0
    |           0 |
    |     4305680 | -> 4217088
    |          21 |
    |          12 |
    +-------------+

DigitalMars

a:  +-------------+
    |     4448636 | -> 4205756
    |          42 |
    |          24 |
    +-------------+

b:  +-------------+
    |     4448640 | -> 0
    |     4581960 | -> 5856153
    |     4448636 | -> 4205756
    |          42 |
    |          24 |
    +-------------+

c:  +-------------+
    |     4448660 | -> 4205896
    |     4448648 | -> 0
    |     4448656 | -> 4203244
    |          21 |
    |          12 |
    +-------------+

d:  +-------------+
    |     4448660 | -> 4205896
    |     4448672 | -> 0
    |     4448664 | -> 0
    |     5838868 | -> 4500440
    |     4448680 | -> 4203252
    |          21 |
    |          12 |
    +-------------+

好吧,看这情景,我怎么回答都算不上错,因为哪有正确答案啊。

可惜把 Borland C++ Builder 和 Open Watcom 编译器都删了,不然如果看到 5 种不同的结果岂不是更欢乐~ 唉,难怪 C++ 木有 ABI。

另外,被问到虚函数表在对象的什么部位时我发烧似的答道“在尾部”,囧啊,谁教我的。(不过另一方面,可以诡辩一下的是, C++ 2003 标准甚至没有规定运行期多态一定要以虚函数表的形式实现,只是目前的编译器不约而同的采用了虚函数表而已,所以,嗯...)

Update:

Open Watcom 编译出来的结果是这样的:

a:
  +-------------+
  |          42 |
  |          24 |
  |     4240484 | -> 4198782
  +-------------+
b:
  +-------------+
  |     4240544 | -> 0
  |           8 |
  |          42 |
  |          24 |
  |     4240556 | -> 4198782
  +-------------+
c:
  +-------------+
  |     4240488 | -> 0
  |     4240500 | -> 4198952
  |          12 |
  |          21 |
  |          12 |
  |     4240508 | -> 4198866
  +-------------+
d:
  +-------------+
  |     4240520 | -> 0
  |     4240532 | -> 4198952
  |     4240512 | -> 8
  |          16 |
  |          21 |
  |          12 |
  |     4240540 | -> 4199045
  +-------------+

果然又不同——不过,惊喜啊,它还真把虚函数表放在了尾部。

2. 脑袋短路

还是网易面试,几个算法题都发挥得不够满意,甚至包括生成随机数的题,回头想想也有改进的空间。

嗯...不知道试题有没有涉及到某种需要保密的东西,所以暂时先不谈好了……

3. 春天来了

img_0059.jpg
ps. 实践表明,GCC 编译器会给类的成员函数添一个隐含的 this 参数,通常会自动赋值,然而用奇淫手段弄到了函数地址时也可以手动传对象地址去,就像在 python 中对待 self 那样;然而其他的 C++ 编译器似乎都没这么干,至少你把 void A::f() 当作 void f(A*) 来玩是不行的,可惜了,不然会挺好玩的。

最长公共子串 [后缀树应用之二]

1. 我干了什么

我写了个小程序,用 1.276 秒找出了《银河系漫游指南》和《宇宙尽头的餐馆》的最长公共子串:

the history of every major galactic civilization tends to pass through three distinct and recognizable phases, those of survival, inquiry and sophistication, otherwise known as the how, why and where phases.

用 1.266 秒找出了《宇宙尽头的餐馆》和《再见,谢谢所有的鱼》的最长公共子串:

in the uncharted backwaters of the unfashionable end of the western spiral arm of the galaxy

用 8.671 秒找出了《魔戒》和《1984》的最长公共子串:

. it was inevitable that they should make

用 6.728 秒找出了《飘》和《基本无害》的最长公共子串:

more than anything else in the world.

...

我发现这是件很好玩的事情~

2. 怎么做的

Again,有了 后缀树 ,这样的工作太简单了:

/*
 * Author  : If
 * Date    : Oct/16th/2010
 * License : Boost License v1

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license  (the "Software")  to use,  reproduce,  display,  distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED,  INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
*/

#include "suffixtree.h"
#include <algorithm>
#include <iterator>
#include <iostream>
#include <fstream>
#include <cctype>
#include <cstdio>
#include <string>
#include <vector>
#include <ctime>
using namespace std;

static void ticToc_(bool s)
{
    static clock_t start_ = clock();
    if(s) {
        start_ = clock();
    } else {
        clock_t end_=clock();
        std::cerr
            << (end_ - start_) << " ticks"
            << "    (" << (end_ - start_) * (1000.)/CLOCKS_PER_SEC <<"ms)"
            << std::endl;
    }
}

inline void tic()
{
    ticToc_(true);
}

inline void toc()
{
    ticToc_(false);
}

struct Info
{
    vector<int> belongTo;
};

typedef Node<char, Info>       NodeT;
typedef SuffixTree<char, Info> STree;

// 初始化节点的从属属性。
void initInfo(NodeT& leaf)
{
    NodeT* node = leaf.parent;
    int    nth  = leaf.belongTo;
    while ( node->parent!=node && (
            node->info.belongTo.empty() ||
            node->info.belongTo.back()!=nth )) {
        node->info.belongTo.push_back(nth);
        node=node->parent;
    }
}

void printPath(NodeT const* node)
{
    if(node->parent!=node)
        printPath(node->parent);
    copy(node->str, node->str+node->len,
         ostream_iterator<char>(cout));
}

// 将连续的空白字符转换成单个空格;将大写全转换为小写。
void filter(char * text, int & len)
{
    char*p=text,*q=text;
    for(;q<text+len;++p,++q){
        if(*q>='A'&&*q<='Z') {
            *p=*q-'A'+'a';
        } else if(isspace(*q)) {
            *p=' ';
            while(isspace(*q))
                ++q;
            --q;
        } else {
            *p=*q;
        }
    }
    len = p-text;
}

struct LetsRock
{
    NodeT const*& result;
    int   longestNow;
    int   nStrings;
    LetsRock(int n, NodeT const*& root):
        result(root),
        longestNow(0),
        nStrings(n)
    { }
    void operator()(NodeT const& node) {
        if (node.depth + node.len > longestNow &&
            node.info.belongTo.size()==nStrings) {
            longestNow = node.depth + node.len;
            result     = &node;
        }
    }
};

int main(int c,char**v)
{
    STree tree;
    tic();
    for(int i=1;i<c;++i) {
        cerr<<"reading "<<v[i]<<" ...\n";
        FILE*  file = fopen(v[i],"r");
        string content;
        char   buf[8192];
        int    bytesRead=0;
        int    bytesTotal=0;
        while((bytesRead=fread(buf,1,8192,file))==8192) {
            filter(buf,bytesRead);
            content.append(buf,bytesRead);
            bytesTotal += bytesRead;
        }
        if(bytesRead) {
            filter(buf,bytesRead);
            content.append(buf,bytesRead);
            bytesTotal += bytesRead;
        }
        fclose(file);
        cerr<<"done. ( " << bytesTotal <<" bytes )\n";
        cerr<<"adding to suffix tree ...\n";
        tree.addString(content);
        cerr<<"done.\n";
    }
    NodeT const* ans = tree.root();
    tree.eachLeaf(initInfo);
    LetsRock fun(c-1,ans);
    tree.dfs(fun);
    printPath(ans);
    cout<<endl;
    toc();
    return 0;
}

代码压缩工具原型 [后缀树应用之一]

嗯,对我来说是代码压缩工具原型;对大多数地球生物来说这其实是最长不重叠重复子串问题

1. 我干了什么

我写了个程序,运行时内存占用峰值为 323 MB,耗时 6.582s,从 2,827,372 字节的《魔戒》全本中找出了不算空格、大小写敏感的情况下出现两次或更多的最长子串:

All that is gold does not glitter,
Not all those who wander are lost;
The old that is strong does not wither,
Deep roots are not reached by the frost.
 
From the ashes a fire shall be woken,
A light from the shadows shall spring;
Renewed shall be blade that was broken,
The crownless again shall be king.

耗时 7.022 秒,内存占用峰值 351 MB,从 3,872,493 字节的 sqlite 3.6.22 的合并 c 文件中找到了最长重复子串:(忽略空格)

.c****************//**************Beginfileos_common.h********************
****** *************//***2004May22****Theauthordisclaimscopyrighttothissou
rcecode.Inpla ceof**alegalnotice,hereisablessing:****Mayyoudogoodandnotevi
l.**Mayyoufindforgiv enessforyourselfandforgiveothers.**Mayyousharefreely,
nevertakingmorethanyougive.  *********************************************
*********************************** ****Thisfilecontainsmacrosandalittlebi
tofcodethatiscommonto**alloftheplatform-sp ecificfiles(os_*.c)andis#includ
edintothose**files.****Thisfileshouldbe#includedb ytheos_*.cfilesonly.Itis
nota**generalpurposeheaderfile.*/#ifndef_OS_COMMON_H_#de fine_OS_COMMON_H_
/***AtleasttwobugshaveslippedinbecausewechangedtheMEMORY_DEBUG* *macrotoSQ
LITE_DEBUGandsomeoldermakefileshavenotyetmadethe**switch.Thefollowingc ode
shouldcatchthisproblematcompile-time.*/#ifdefMEMORY_DEBUG#error&quot;TheME
MORY_DEB UGmacroisobsolete.UseSQLITE_DEBUGinstead.&quot;#endif#ifdefSQLITE
_DEBUGSQLITE_PRIVATE intsqlite3OSTrace=0;#defineOSTRACE1(X)if(sqlite3OSTra
ce)sqlite3DebugPrintf(X)#de fineOSTRACE2(X,Y)if(sqlite3OSTrace)sqlite3Debu
gPrintf(X,Y)#defineOSTRACE3(X,Y,Z) if(sqlite3OSTrace)sqlite3DebugPrintf(X,
Y,Z)#defineOSTRACE4(X,Y,Z,A)if(sqlite3OST race)sqlite3DebugPrintf(X,Y,Z,A)
#defineOSTRACE5(X,Y,Z,A,B)if(sqlite3OSTrace)sqli te3DebugPrintf(X,Y,Z,A,B)
#defineOSTRACE6(X,Y,Z,A,B,C)\if(sqlite3OSTrace)sqlite3D ebugPrintf(X,Y,Z,A
,B,C)#defineOSTRACE7(X,Y,Z,A,B,C,D)\if(sqlite3OSTrace)sqlite3D ebugPrintf(
X,Y,Z,A,B,C,D)#else#defineOSTRACE1(X)#defineOSTRACE2(X,Y)#defineOSTRA CE3(
X,Y,Z)#defineOSTRACE4(X,Y,Z,A)#defineOSTRACE5(X,Y,Z,A,B)#defineOSTRACE6(X,
Y, Z,A,B,C)#defineOSTRACE7(X,Y,Z,A,B,C,D)#endif/***Macrosforperformancetra
cing.Norm allyturnedoff.Onlyworks**oni486hardware.*/#ifdefSQLITE_PERFORMAN
CE_TRACE/***hwti me.hcontainsinlineassemblercodeforimplementing**high-perf
ormancetimingroutines.* //**************Includehwtime.hinthemiddleofos_com
mon.h****************//******* *******Beginfilehwtime.h*******************
***********************//***2008May27 ****Theauthordisclaimscopyrighttothi
ssourcecode.Inplaceof**alegalnotice,hereisab lessing:****Mayyoudogoodandno
tevil.**Mayyoufindforgivenessforyourselfandforgiveo thers.**Mayyousharefre
ely,nevertakingmorethanyougive.*************************** ***************
******************************************Thisfilecontainsinlinea smcodefo
rretrieving&quot;high-performance&quot;**countersforx86classCPUs.*/#ifndef
_HWTIME_ H_#define_HWTIME_H_/***Thefollowingroutineonlyworksonpentium-clas
s(ornewer)proce ssors.**ItusestheRDTSCopcodetoreadthecyclecountvalueoutoft
he**processorandreturn sthatvalue.Thiscanbeusedforhigh-res**profiling.*/#i
f(defined(__GNUC__)||defined( _MSC_VER))&amp;&amp;\(defined(i386)||defined
(__i386__)||defined(_M_IX86))#ifdefined(__GN UC__)__inline__sqlite_uint64s
qlite3Hwtime(void){unsignedintlo,hi;__asm____volati le__(&quot;rdtsc&quot;
:&quot;=a&quot;(lo),&quot;=d&quot;(hi));return(sqlite_uint64)hi&lt;&lt;32|
lo;}#elifdefined(_MS C_VER)__declspec(naked)__inlinesqlite_uint64__cdeclsq
lite3Hwtime(void){__asm{rdt scret;returnvalueatEDX:EAX}}#endif#elif(define
d(__GNUC__)&amp;&amp;defined(__x86_64__))_ _inline__sqlite_uint64sqlite3Hw
time(void){unsignedlongval;__asm____volatile__(&quot;r dtsc&quot;:&quot;=A
&quot;(val));returnval;}#elif(defined(__GNUC__)&amp;&amp;defined(__ppc__))
__inline__ sqlite_uint64sqlite3Hwtime(void){unsignedlonglongretval;unsigne
dlongjunk;__asm__ __volatile__(&quot;\n\1:mftbu%1\n\mftb%L0\n\mftbu%0\n\cm
pw%0,%1\n\bne1b&quot;:&quot;=r&quot;(retval) ,&quot;=r&quot;(junk));return
retval;}#else#errorNeedimplementationofsqlite3Hwtime()foryour platform./**
*Tocompilewithoutimplementingsqlite3Hwtime()foryourplatform,**youcan remov
etheabove#errorandusethefollowing**stubfunction.Youwilllosetimingsupportfo
r many**ofthedebuggingandtestingutilities,butitshouldat**leastcompileandru
n.*/SQLI TE_PRIVATEsqlite_uint64sqlite3Hwtime(void){return((sqlite_uint64)
0);}#endif#endi f/*!defined(_HWTIME_H_)*//**************Endofhwtime.h*****
********************** *******************//**************Continuingwherew
eleftoffinos_common.h******** **********/staticsqlite_uint64g_start;static
sqlite_uint64g_elapsed;#defineTIMER_ STARTg_start=sqlite3Hwtime()#defineTI
MER_ENDg_elapsed=sqlite3Hwtime()-g_start#de fineTIMER_ELAPSEDg_elapsed#els
e#defineTIMER_START#defineTIMER_END#defineTIMER_EL APSED((sqlite_uint64)0)
#endif/***IfwecompilewiththeSQLITE_TESTmacroset,thenthefo llowingblock**of
codewillgiveustheabilitytosimulateadiskI/Oerror.This**isusedfort estingthe
I/Orecoverylogic.*/#ifdefSQLITE_TESTSQLITE_APIintsqlite3_io_error_hit=0 ;/
*TotalnumberofI/OErrors*/SQLITE_APIintsqlite3_io_error_hardhit=0;/*Numbero
fnon -benignerrors*/SQLITE_APIintsqlite3_io_error_pending=0;/*Countdowntof
irstI/Oerro r*/SQLITE_APIintsqlite3_io_error_persist=0;/*TrueifI/Oerrorspe
rsist*/SQLITE_APIi ntsqlite3_io_error_benign=0;/*Trueiferrorsarebenign*/SQ
LITE_APIintsqlite3_diskfu ll_pending=0;SQLITE_APIintsqlite3_diskfull=0;#de
fineSimulateIOErrorBenign(X)sqli te3_io_error_benign=(X)#defineSimulateIOE
rror(CODE)\if((sqlite3_io_error_persist &amp;&amp;sqlite3_io_error_hit)\||
sqlite3_io_error_pending--==1)\{local_ioerr();CODE;}st aticvoidlocal_ioerr
(){IOTRACE((&quot;IOERR\n&quot;));sqlite3_io_error_hit++;if(!sqlite3_io _e
rror_benign)sqlite3_io_error_hardhit++;}#defineSimulateDiskfullError(CODE)
\if( sqlite3_diskfull_pending){\if(sqlite3_diskfull_pending==1){\local_ioe
rr();\sqlit e3_diskfull=1;\sqlite3_io_error_hit=1;\CODE;\}else{\sqlite3_di
skfull_pending--;\ }\}#else#defineSimulateIOErrorBenign(X)#defineSimulateI
OError(A)#defineSimulateD iskfullError(A)#endif/***Whentesting,keepacounto
fthenumberofopenfiles.*/#ifdefSQ LITE_TESTSQLITE_APIintsqlite3_open_file_c
ount=0;#defineOpenCounter(X)sqlite3_ope n_file_count+=(X)#else#defineOpenC
ounter(X)#endif#endif/*!defined(_OS_COMMON_H_) *//**************Endofos_co
mmon.h*******************************************//** ************Continui
ngwhereweleftoffinos_ 

正如之前在 后缀树的介绍 中说过的那样,俺想要弄出个能压缩 C 程序代码的工具,而上面的就是原型了:找出出现多次且较长的子串, #define 一下,重复多次,差不多了。

2. 怎么做的

有了后缀树,这样的工作可真是简单啊:

/*
 * Author  : If
 * Date    : Oct/14th/2010
 * Version : 0.0.1
 */
/*
Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license  (the "Software")  to use,  reproduce,  display,  distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED,  INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
*/

#include "suffixtree.h"
#include <algorithm>
#include <iterator>
#include <iostream>
#include <string>
#include <cmath>
#include <ctime>
using namespace std;

static void ticToc_(bool s)
{
    static clock_t start_ = clock();
    if(s) {
        start_ = clock();
    } else {
        clock_t end_=clock();
        std::cerr
            << (end_ - start_) << " ticks"
            << "    (" << (end_ - start_) * (1000.)/CLOCKS_PER_SEC <<"ms)"
            << std::endl;
    }
}

inline void tic()
{
    ticToc_(true);
}

inline void toc()
{
    ticToc_(false);
}


struct Info {
    int cv,depth,isLeaf;
    Info():cv(-1),depth(-1),isLeaf(0){}
};

typedef Node<char,Info> NodeT;

void initInfo(NodeT& node) {
    node.info.depth = node.depth + node.len;
    if(node.branches.empty()){
        node.info.isLeaf = 1;
        node.info.cv = node.leafNum;
    } else {
        node.info.isLeaf = 0;
        NodeT* p = node.branches[0];
        if(p->info.cv==-1) {
            initInfo(*p);
        }
        node.info.cv = p->info.cv;
    }
}

void printPath(NodeT const* node)
{
    if(node->parent!=node) {
        printPath(node->parent);
    }
    copy(node->str, node->str+node->len, ostream_iterator<char>(cout));
}

struct LetsHaveFun
{
    NodeT const*& result;
    LetsHaveFun(NodeT const*& root):result(root){}
    void operator()(NodeT& node){
        if(!node.info.isLeaf &&  // more than once
            node.depth != 0 &&   // not root
            node.info.depth > result->info.depth && // better
            abs(node.info.cv - node.branches.back()->info.cv) >=
                node.info.depth  // do not overlap
        ) {
            this->result = &node;
        }
    }
};

int main(int c,char**v) {
    if(c>2)
        return 1;
    SuffixTree<char, Info> stree;
    NodeT const* ans = stree.root();
    LetsHaveFun fun(ans);
    string source;
    if(c==2&&v[1][0]=='-') {
        if(v[1][1]=='S') // do not skip white space
            cin>>noskipws;
        else if(v[1][1]=='s')
            cin>>skipws;
        else
            return 1;
    } else {
        return 1;
    }

    copy(istream_iterator<char>(cin), istream_iterator<char>(),
         back_inserter(source));
    tic();
    stree.addString(source);
    stree.dfs(initInfo);
    stree.dfs(fun);
    printPath(ans);
    cout<<endl;
    toc();
}

3. Question

这次有问题顺便想要请教一下高人们,在上面的代码中有一段让我很不爽:

struct LetsHaveFun
{
    NodeT const*& result;
    ...

我的本意是想用 NodeT const* result ; 然而那么做的话 stree.dfs(fun) 这句话结束之后 fun.result 又回到了 root (LetsHaveFun::operator() 中 result 的确会变的!),百思不得其解,似乎碰到了某个语法的黑暗角落;然而在这个问题上 VC2008 和 GCC 4.5.0 竟然表现得出奇的一致以至于我开始怀疑自己了,谁来救救我?

4. PostScript

suffixtree.h 全部内容在我的 后缀树的介绍 中。

5. Update

Ubuntu 威武,同样的代码,同样的文件系统,同样的计算机配置,同样的编译器,同样的编译选项,在 Ubuntu 上面,此处(Win7)耗时 6.5 秒的程序变得只要 4.5 秒了,Astonishing

Suffix Tree 学习笔记 I

在这篇文章中你将会看到:





* 什么是后缀树

* 后缀树有啥用

* 如何构建后缀树

* 构建后缀树的算法复杂度分析

* 具体实现的源代码

* 以及更多









文章本身很长……还是点进来看吧…… ..more

专家!哥!

好久没看 CSDN,今天一看就把我吓了一跳:

首页推荐了某个号称专家而且脑袋上还盖着“专家”两字的生物写的名字很吓人以至于让我忍不住点进去看了的文章《Boost库对unicode字符集的支持方式探究》,文章说,他发现了原来 boost 和 STL 一样给 unicode 函数/类加了个 w 前缀。

 

 

 

 

 

 

 

囧。

C++ ABI, Screw You!

用 TDM-GCC 4.5 编译了 OpenCV 2.1 和 boost 1.44 

下载了用(某不知道来源的) MinGW GCC 4.4 编译的 Qt 4.6

问题在于,这两个版本的 MinGW 显然不兼容!

今天需要用到这三个库时,链接就总是悲剧 —— 用 TDM-GCC 编译工程,则找不到 _Unwind_Resume 函数(链接 libgcc_s.a 还是找不到、同时还会伴随出现 n 个重复定义);用 Qt 目录下那个 MinGW 编译工程的话,则有无数个找不到实现的 boost 库函数。

 

现在我在想是要重新编译一下 OpenCV 以及 Boost 呢,还是重新编译一下 Qt 呢。

 

update:
花了近四个小时把 Qt 自己编译了一遍,其间顺便(终于)看了《银翼杀手》,问题果然解决,真累。
话说银翼杀手确实是部不可多得的好片,情节引人入胜同时又发人深省,只是我还不敢说看懂了,什么时候还得再看几遍。
另外,从编译的流程来看 Qt 的确是世界上最容易编译的东西之一了,configure,make,done —— 只是,它好啊。 

感情被严重滴欺骗了

在今天以前我一直以为 MinGW 只认得 libXXX.a ,不认得 XXX.lib

 

 

 

 

 

 

 

 

丫的,害我做过多少无用功啊!

 

先就这样吧

对打马赛克有特别性趣的童鞋们,你们可以先试着玩一下了:

=> 二进制文件及源代码打包下载 [2.4 MB]

其中彩色马赛克拼接的效果目前还很不靠谱,请谨慎使用。

用法:

  • 首先把一大堆图像转换成某个统一大小的缩略图 ( 如果你自己没有什么好的办法能做到这一点,可以考虑一下我为你量身定做的 resize.exe,在压缩包里 ) 
  • 然后关机,膜拜一下春哥
  • 开机,你会发现拼接图像已经生成好了

 

好吧,我是开玩笑的。

实际上,mosaic 的用法如下:

mosaic <源文件> -o <输出图像> <选项们>

选项有:

  • -p <缩略图库目录> (支持多个目录,至少得要有一个非空的目录)
  • -S <缩略图大小> (单位为像素,默认48)
  • -s <用于比较的缩略图大小> (将会把缩略图再缩率成这个模样,然后用于比较,单位像素,默认12)
  • -z <放大倍数> (实数,默认为 1)
  • -c (启用彩色,默认灰白)
  • -t <n> (从前n个图像中随机选一个用于填充,默认13)
  • -x (完成后在系统关联程序中打开)
  • -d (完成后显示一下)

一个简单的例子是: mosaic -p pool in.jpg -o out.jpg

 

主要的源代码如下:

其中那个 sizedpqueue.hpp 中是一个名曰 sizedpq 的俺实现的比较丑陋的用于保存前 n 小元素用的数据结构类,由于丑陋,就不贴了,反正那个压缩包里有……

// vi:foldmethod=marker
// Author: if ( www.if-yu.info)
// Version: 0.0.3.100613
// Date: June 13rd, 2010
// Copyright (C): if
// License: BSD

/*
 change log:

 * Version 0.0.3:
     can see color

 * Version 0.0.2:
     got an command-line interface

 * Version 0.0.1:
     born

 */

// misc {{{
#include <exception>
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include <cmath>

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#include <boost/filesystem.hpp>
#include <boost/program_options.hpp>

#include <opencv/cv.h>
#include <opencv/cv.hpp>
#include <opencv/cxcore.h>
#include <opencv/cxcore.hpp>
#include <opencv/highgui.h>

#include "sizedpqueue.hpp"

using namespace std;
namespace fs = boost::filesystem;
namespace po = boost::program_options;

int g_thumb_size = 12;
int g_image_size = 48;
int g_top_n   = 13;
bool g_color = false;

typedef struct Image {
    cv::Mat thumb;
    cv::Mat orig;
} Image;

typedef struct Record {
    double d;
    int    idx;
} Record;
class RecordCmp {
public:
    bool operator()(Record const& a, Record const& b)const{
        return a.d<b.d;
    }
};
// }}}

// mean idea:
// calculate a `distance' between two images, which shows their similarity {{{
double dist(cv::Mat const& m1, cv::Mat const& m2) {
    assert(m1.type() == m2.type() && m1.size() == m2.size());

    if(CV_MAT_TYPE(m1.type())==CV_8UC1) {
        typedef unsigned char uchar;
        int i,I=m1.rows,j,J=m1.cols,t;
        double d=0;
        for(i=0;i<I;++i)
            for(j=0;j<J;++j) {
                t=int(m1.at<uchar>(i,j))-int(m2.at<uchar>(i,j));
                t*=t;
                d+=/*fabs*/(t);
            }
        return d;
    } else if(CV_MAT_TYPE(m1.type())==CV_8UC3) {
        double dM=0.0, t, d[3]={0.};
        cv::Scalar mean1 = cv::mean(m1);
        cv::Scalar mean2 = cv::mean(m2);
        for(int c=0;c<3;++c) {
            t=mean1.val[c]-mean2.val[c];
            t*=t;
            dM+=t;
        }

        for(int i=0, I=m1.rows; i<I; ++i) {
            for(int j=0, J=m1.cols; j<J; ++j) {
                typedef cv::Vec<uchar,3> BGR_color;
                BGR_color c1=m1.at<BGR_color>(i,j);
                BGR_color c2=m2.at<BGR_color>(i,j);

                for(int c=0;c<3;++c) {
                    t = c1[c]-c2[c];
                    d[c] += t*t;
                }
            }
        }

#if 0
        // compare their histogram
        int hist1[3][64]={{0}}, hist2[3][64]={{0}};
        for(int i=0,I=m1.rows; i<I; ++i) {
            for(int j=0,J=m1.cols; j<J; ++j) {
                typedef cv::Vec<uchar,3> BGR_color;
                BGR_color c1 = m1.at<BGR_color>(i, j);
                BGR_color c2 = m2.at<BGR_color>(i, j);
                for(int c=0;c<3;++c) {
                    ++hist1[c][c1[c]/4];
                    ++hist2[c][c2[c]/4];
                }
            }
        }
        // algorithms:
        // 全不靠谱……
        // correlation
        {
            double a=0.,b=0.,up[3]={0.},down[3]={0.},s1[3]={0.},s2[3]={0.};
            double avg1[3]={0}, avg2[3]={0};
            for(int c=0;c<3;++c) {
                for(int i=0;i<64;++i) {
                    avg1[c] += hist1[c][i]/64.;
                    avg2[c] += hist2[c][i]/64.;
                }
            }
            for(int c=0;c<3;++c){
                for(int i=0;i<64;++i) {
                    a=hist1[c][i]-avg1[c];
                    b=hist2[c][i]-avg2[c];
                    up[c] += a*b;
                    s1[c] += a*a;
                    s2[c] += b*b;
                }
            }
            for(int c=0;c<3;++c){
                down[c] = sqrt(s1[c]*s2[c]);
                d[c] = up[c]/down[c];
            }
        }
        /*
        // so-called chi-square, sucks
        for(int i=0;i<64;++i){
            for(int c=0;c<3;++c) {
                t=(hist1[c][i]-hist2[c][i]);
                t*=t;
                d[c] += t/(hist1[c][i]+hist2[c][i]);
            }
        }
        */
#endif
        t = 0;
        for(int i=0;i<3;++i){
            t+=d[i]*d[i];
        }
        return 0.7*dM + 0.3*t;
    } else {
        // only support CV_8UC1 and CV_8UC3 so far
        assert(CV_MAT_TYPE(m1.type()) == CV_8UC1 || CV_MAT_TYPE(m1.type()) == CV_8UC3); // false;
    }
    return 0;
}
// }}}

// core features: {{{
void addToPool(std::vector<Image>& pool, string const& dir) {
    cv::Size expect(g_image_size, g_image_size);
    cout<<"----- reading "<<(g_color?"color":"gray")<<" images from \""<<dir<<"\" -----"<<endl;
    for(fs::directory_iterator i(dir),I;i!=I;++i){
        Image img;
        img.orig = cv::imread(i->path().string(), g_color?1:0);
        if(img.orig.size()!=expect) {
            cout<<"X";
            continue;
        }
        cv::resize(img.orig, img.thumb, cv::Size(g_thumb_size, g_thumb_size));
        // cv::blur(img.thumb, img.thumb, cv::Size(2,2));
        pool.push_back(img);
        cout<<'+';
    }
    cout<<"\n [done]"<<endl;
}

int chooseRandomFromPool(vector<Image> const& pool, cv::Mat const& source) {
    int a=0,c=pool.size();
    sizedpq<Record,RecordCmp> top(g_top_n);
    cv::Mat thumb;
    cv::resize(source, thumb, cv::Size(g_thumb_size, g_thumb_size));
    for(a=0;a<c;++a) {
        Record r;
        r.d=dist(pool[a].thumb, thumb);
        r.idx=a;
        top.insert(r);
    }
    return top[rand()%g_top_n].idx;
}

void doRandomMosaic(vector<Image> const& pool, cv::Mat const& src, cv::Mat & dst){
    cv::Size size(src.size());
    size.width  /= g_thumb_size;
    size.height /= g_thumb_size;
    cv::Size dstSize( size.width*g_image_size, size.height*g_image_size);
    if(dst.size()!=dstSize) {
        dst = cv::Mat::zeros(dstSize, g_color?CV_8UC3:CV_8UC1);
    }
    cout<<"----- doing what you want me to -----"<<endl;
    if (dstSize.width>800||dstSize.height>600) {
        cout<<"image is too big, i do not want to display it."<<endl;
    }
    for(int i=0,I=size.height; i<I; ++i) {
        for(int j=0,J=size.width; j<J; ++j) {
            cv::Rect r1(j*g_thumb_size,i*g_thumb_size,g_thumb_size,g_thumb_size);
            cv::Rect r2(j*g_image_size,i*g_image_size,g_image_size,g_image_size);
            cv::Mat s(src, r1);
            cv::Mat d(dst, r2);
            int n = chooseRandomFromPool(pool, s);
            pool[n].orig.copyTo(d);
            cout<<'.';
            if(!(dstSize.width>800||dstSize.height>600)) {
                cv::imshow("dst", dst);
                cv::waitKey(10);
            }
        }
        double precence = double(i+1)/(size.height)*100.0;
        cout<<" [ "<<setw(6)<<setprecision(2)<<fixed<<precence<<"% ]\n";
    }
    cout<<"\n [done]"<<endl;
}
// }}}

// interface (if it can be called an interface) {{{
void errorOut(string const& msg) {
    cerr<<msg<<endl;
    exit(1);
}

cv::Size calcDisplaySize(cv::Size sz)
{
    int w = sz.width,
        h = sz.height;
    int w2 = w,
        h2 = h;
    if (w>=h) {
        if(w>700) {
            w2= 700;
            h2= h * w2 / w;
        }
    } else {
        if (h>700) {
            h2= 700;
            w2= w * h2 / h;
        }
    }
    return cv::Size(w2,h2);
}

void display(cv::Mat const& img)
{
    cv::Mat m;
    cv::Size s=calcDisplaySize(img.size());
    if(s!=img.size()) {
        cv::resize(img,m,s);
        cv::imshow("display", m);
    } else {
        cv::imshow("display", img);
    }
    cv::waitKey(0);
}

int main(int argc, char *argv[]) {
    srand(time(0));
    vector<Image> pool;
    vector<string> poolPathes;
    string inPath;
    string outPath;
    double zoom=1.0;

    // boost::program_options is cute!
    po::variables_map vm;
    po::options_description desc("options");
    try{
        desc.add_options()
            ("help,h", "show this message")
            ("color,c", "generate colored mosaic")
            ("zoom,z", po::value<double>(&zoom)->default_value(zoom), "zoom rate")
            ("pool,p", po::value<vector<string> >(&poolPathes), "image pool")
            ("image-size,S", po::value<int>(&g_image_size)->default_value(g_image_size),
                "size of small images")
            ("thumb-size,s", po::value<int>(&g_thumb_size)->default_value(g_thumb_size),
                "size of thumbnails to be used for compairing")
            ("input,i", po::value<string>(&inPath), "input image file")
            ("output,o", po::value<string>(&outPath), "output image file")
            ("top-n,t", po::value<int>(&g_top_n)->default_value(g_top_n), "choose a random piece from top-n like ones")
            ("display,d", "Display the result image")
            ("open,x", "View the result image using associated program in your system")
        ;
        po::positional_options_description p;
        p.add("input", -1);
        po::store(po::command_line_parser(argc,argv)
                     .options(desc).positional(p).run(),
                  vm);
        po::notify(vm);
    } catch(exception const& e) {
        errorOut(e.what());
    }
    if(vm.count("help")) {
        cout<<desc<<endl;
        return 0;
    }
    if(! vm.count("input")) {
        errorOut("No input file");
    }
    if(! vm.count("output")) {
        errorOut("No output file");
    }
    if( vm.count("color")) {
        g_color = true;
    }
    if(! vm.count("pool")) {
        errorOut("Where can i find pieces to make mosaic?");
    } else {
        for(vector<string>::const_iterator citr = poolPathes.begin();
            citr != poolPathes.end();
            ++citr )
        {
            if(!fs::exists(*citr)) {
                errorOut("What is \""+*citr+"\" ??");
            } else if( fs::is_directory(*citr)) {
                addToPool(pool, *citr);
            }
        }
        if(pool.empty()) {
            errorOut("What do you expect when u got no image in pool?");
        }
    }
    if( g_image_size<g_thumb_size ) {
        errorOut("You are doing something foolish, human.");
    }

    cv::Mat src;
    cv::Mat dst;
    try{
        if(g_color) {
            cout<<"Reading source image [colored] "<<endl;
            src = cv::imread(inPath, 1);
        } else {
            cout<<"Reading source image [b&w]"<<endl;
            src = cv::imread(inPath, 0);
        }
    } catch(...) {
        errorOut(string("Can\'t read your input file: \"")+inPath+"\"");
    }
    cv::Size s(src.size());
    if(s==cv::Size(0,0))
        errorOut(string("Can\'t read your input file: \"")+inPath+"\"");
    s.width = static_cast<int>(s.width/g_image_size*g_thumb_size*zoom);
    s.width -= s.width%g_thumb_size;
    s.height = static_cast<int>(s.height/g_image_size*g_thumb_size*zoom);
    s.height -= s.height%g_thumb_size;
    if(s!=src.size()) {
        cv::resize(src, dst, s);
        src=dst.clone();
    }
    doRandomMosaic(pool, src, dst);
    try{
        cv::imwrite(outPath,dst);
    } catch(...) {
        errorOut("Failed to save the image, please check your RP.");
    }

    if(vm.count("display")) {
        display(dst);
    }
    if(vm.count("open")) {
        system(outPath.c_str());
    }

    return 0;
}
// }}}