Tuesday, December 30, 2008

Case sensitive Google Search


This application will google basic terms, scan through the results and filter out results of different case than the search terms.
Support for phrase search (quotes) has been recently added and other advanced search options like negative terms, etc. are available as well by using the second input box to provide filter terms without the advanced options.

The initial intention was to create an application which would run a query for a longer period and the client would poll for new results in every few seconds. As the app engine can not spwan sub process the client has to wait until the query is finished and won't get any results until that. The user can set the maximum number of results that will be returned and the maximum number of Google results that will be scanned through.

Thanx for your comments in advance.

Tuesday, December 9, 2008

Hungarian hyphenation by Henri

Hungarian hyphenation pattern file for Liang's algorithm has long been being prepared by Henri.
It's only 26K! You can check it here and leave a comment below or there. Thanx.
Here's Henri's pattern file.
The javascript is based on the hyphenator project.

Monday, May 26, 2008

Yochai Benkler: love this guy

OK, not utterly this guy, but his ideas as he talks about economics of open source:

Wonder when will one be able to download a piece of software which distributedly runs the next Google on those client machines who are actually using it for searching at a certain moment. Without any central server. Considering the ever growing information monopolies of search engines, I guess that would be a nice scenario for the future.

Update: This talk made me so enthusiastic, I've done a little research on this. I even thought to myself: why not put some effort into it and produce this software using open source components. Well, it turns out that this idea is not unique (of course) and this software already exists, thanks to Le-Shin Wu and Ruj Akavipat at the Computer Science Department of Indiana University.

Friday, May 23, 2008

Treasure Hunt 2

Caveman style java implementation of the 2nd Treasure Hunt no-brainer-teaser.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;

public class TreasureHunt2 {

private List getFileListFromPath(String path, String pattern,
String ext){
List result = new LinkedList();
File file = new File(path);
if (file.isDirectory()){
for(File innerFile : file.listFiles()){
result.addAll(getFileListFromPath(innerFile.getPath(),
pattern, ext));
}
} else {
if ((file.getName().endsWith(ext))
&& (file.getPath().indexOf(pattern) > -1)){
result.add(file);
}
}
return result;
}

private Long getNumFromFile(File file, int linNumber) throws IOException {
Long result = 0l;
BufferedReader in = new BufferedReader(new FileReader(file));
String str;
int i = 0;
while ((str = in.readLine()) != null) {
i++;
if (i == linNumber){
if (str.length()>1){
result = new Long(str);
break;
}
}
}
in.close();
return result;
}

private Long sumFiles(List list, int linNumber) throws IOException{
Long sum = 0l;
for (File file : list){
Long add = getNumFromFile(file, linNumber);
sum += add;
}
return sum;
}

class TreasureHuntPattern{
public String namePattern;
public String extension;
public int lineNumber;
public TreasureHuntPattern(String namePattern, String extension,
int lineNumber){
this.namePattern = namePattern;
this.extension = extension;
this.lineNumber = lineNumber;
}
}

final TreasureHuntPattern[] patterns = {
new TreasureHuntPattern("mno", ".xml", 5),
new TreasureHuntPattern("foo", ".pdf", 5)
};
static final String sumFilesPath =
"F:/GoogleTreasureHunt08_17428357331831084502";

public void hunt() throws IOException{
Long prod = 1l;
for (TreasureHuntPattern pattern : patterns){
prod *= sumFiles(
getFileListFromPath(sumFilesPath, pattern.namePattern,
pattern.extension),
pattern.lineNumber);
}
System.out.println("prod:"+prod);
}

public static void main(String[] args) throws IOException{
TreasureHunt2 hunter = new TreasureHunt2();
hunter.hunt();
}

}

Caveman style bash implementation:
sum.sh
summ=0
find $1 | grep $2".*\."$3$ |
(while read
do
f=`cat "./$REPLY"`
num=`echo $f | cut -f$4 -d" "`
let summ=summ+num
done
echo $summ)

prod.sh
sum1=`./sum.sh "./files/GoogleTreasureHunt08_17428357331831084502" mno xml 5`
sum2=`./sum.sh "./files/GoogleTreasureHunt08_17428357331831084502" foo pdf 5`
echo $sum1
echo $sum2
let prod=sum1*sum2
echo $prod

Monday, May 19, 2008

Terminate Tomcat

This command will terminate a single Tomcat instance on a linux machine
kill -9 `ps aux | grep tomcat | grep -v grep | tr -s " " | cut -f2 -d" "`
Does not work when several instances run parallel.

Binomial robot

Stumbled upon a Google Treasure Hunt 'brainteaser':

A robot is located at the top-left corner of a n x k grid.
The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid.
How many possible unique paths are there?
(Note: Answer must be an exact, decimal representation of the number. Image not to scale.)

Do not continue if you don't want your brain-teasing to by spoiled.

Let's see, the robot will take N = (n-1)+(k-1) steps. Out of these N steps, n-l will be down and k-1 will be right. Now, one may recall the binomial coeficient: we want to know how many different ways the robot can take these N steps, i.e. how many different ways he can choose to go either down or right. To put it into an other way: we want to find out the number of combinations the robot can choose to go down. This implies that the answer is ((rows-1+cols-1)!)/((rows-1)!*(cols-1)!).

But the correct answer can be found without recalling any binomial or combination stuff. This quick and dirty java code gives the answer as well. The idea here is that for a two-row grid, the number of possible different paths is k: the robot can pick any of the k columns in the grid to take his single down step. Now what happens if we add an additional row, say a third row? Now the robot can also walk straight through the new row and for all cells he can choose to go down. If he goes down into the second row in the i-th column, then the additional path possibilities are given by the result of a similar puzzle where n = n-1 and k = k-1-i;
Hence adding an extra row means we have to sum over k the possibilities of the same problem with n-1 rows.
final int ROWS = 57; //n
final int COLS = 44; //k
BigInteger [][] result = new BigInteger [ROWS+1][COLS+1];
private BigInteger sumPrevLevel(int row, int cols){
BigInteger res = new BigInteger("0");
for (int col = 1; col <= cols; col++){
res = res.add(result[row-1][col]);
}
return res;
}
public void testTreasureHunt(){
//init
for (int row = 0; row <= ROWS; row++){
result [row][0] = new BigInteger("0");
}
for (int col = 1; col <= COLS; col++){
result[0][col] = new BigInteger("0");
result[1][col] = new BigInteger("1");
}
//do the work
for (int row = 2; row <= ROWS; row++){
for (int col = 1; col <= COLS; col++){
result[row][col] = sumPrevLevel(row, col);
}
}
//print the result
System.out.println(result[ROWS][COLS]);
}

Thursday, May 1, 2008

Multilingual search

I've been playin around a bit with the Google AJAX Search API and the Google AJAX Language API. It may come handy when translating something.
But not from or to Hungarian, as it is not included in Google's translation tools, though Romanian, Polish, Czech, Croatian and Bulgarian languages has been recently added to the supported ones. Either
there's no need for Hungarian translation or Hungarians hadn't got lobbists at Google. Must be the former one. It may be either because Hungarians speak all other languages or because they only read original Hungarian web pages.
Here's Google's version. It not only translates the query terms to a second language but also tranlates the search results back to the original language. So it's seemingly targeted to those guys, who speak only one language. As Google search guru Udi Manber puts it:
"if you’re a user in Egypt, for example, and you only speak Arabic, you can write the query in Arabic, ask to translate it into English. When you then click on the results, it will translate the Web pages to Arabic for you—all of it done by Google translation. That opens the whole world to everybody." (source)

Monday, April 14, 2008

Computer won a 9x9 game of Go against professional 5th DAN Catalin Taranu

During the Go Tournament in Paris, staged between 22 and 24 March 2008 by the French Go Federation (FFG), the MoGo artificial intelligence (IA) engine developed by INRIA – the French National Institute for Research in Computer Science and Control – running on a Bull NovaScale supercomputer, won a 9x9 game of Go against professional 5th DAN Catalin Taranu. This was the first ever officially sanctioned ‘non blitz’ victory of a ‘machine’ over a Go Master.
source
ohter sources

Among many others, we have two Hungarian SZTAKI guys in the background, Levente Kocsis and Csaba Szepesvári.

Their contribution can be spotted in the MoGo page, and it was earlier mentioned by many sources.