2002 December 31 Tuesday
Google Glossary and Google Sets on Google Labs

Google has a new experimental feature on Google Labs called Google Glossary. It finds and extracts definitions of words and phrases from web sites. I had success with therapeutic cloning, Java Beans, temperature inversion, and multiple inheritance but not with reproductive cloning. Where dictionaries rarely list two or three word combinations that have some specific technical or cultural meaning it appears that Google Glossary can come back with decent definitions at least some of the time.

Google also has something called Google Sets that is very cool. Give it names of things from a set and it tries to predict other words that also belong in the same set. For instance, Buffy, Willow and Tara successfully yield a list of other characters from Buffy The Vampire Slayer. Kirk and McCoy successfully return a list of Star Trek TOS crew members. I mean, how neat is that? Sex, drugs, rock successfully predicts "roll". You get the idea. Some of the results are, well, curious. Chopin, Bach, and Mozart return a list of mostly classical composers. But Elvis is on the list and so is the word "Introduction". If you come up with any interesting word combinations please come back and post them in the comments to this post.

By Randall Parker    2002 December 31 10:24 PM   Entry Permalink | Comments (2)
Java+ Preprocessor Released

Brad Cox (I think the same guy who invented Objective C) has released a Java preprocessor called Java+.

Java+ is an open source Java preprocessor (download) that adds these features to any Java compiler:

  • Multi-line strings with executable inclusions like Perl and Ruby
  • Optionally segregates Java+ strings into ResourceBundle files
  • Eliminates the need for JSP or ASP and their need for Java compilers on deployment servers (a security concern)
  • Optionally processes only inputs whose outputs are out of date
  • Extremely fast. Astonishingly so.
  • Adds no overhead, in either space or time
  • Graphical and command-line interfaces
  • Simple, general, recursive string syntax
  • Free software, BSD open source license
  • Comprehensive built-in documentation

By Randall Parker    2002 December 31 09:50 PM   Entry Permalink | Comments (1)
2002 December 18 Wednesday
Opera 7 Beta 2 Released

Opera Software has released the second beta of the web browser Opera 7 for Windows. Note that it appears to be called b1b instead of b2. With all of about 10 minutes using it with a couple dozen pages open it appears to work okay. If you are heavy browser user and all you know is MS IE it might be worth your while to give Opera a try. My favorite browser is Mozilla at this point but Opera is coming along quite nicely.

Update: Two things annoy me about v7 as compared to v5: First, it doesn't have a Go button to the right of the URL control. Second, if you right click in the URL control the pop-up does not have the Cut/Copy/Paste options. So one has to use the keyboard more when copying and pasting in and out of URL control. If you want to grab a URL out of a control to, say, put in a blog post its more work to move from mousing to keyboarding to to it.

I ought to go to their news server and complain about this.

Update II: The correct file to download is ow32enen700b2j.exe (thanks to DJ SkaM for pointing out the difference). The File I originally downloaded really was for beta 1b. My guess is that I downloaded it so soon after beta 2 was announced that b2 hadn't propagated out to the servers yet. Beta 2 does fix the ability to right click on the URL control to do Copy/Cut/Paste on it. Though it still does not have the Go button.

By Randall Parker    2002 December 18 01:54 AM   Entry Permalink | Comments (2)
2002 December 16 Monday
Google Zeitgeist 2002 Released

Google has released their 2002 Zeitgeist of the web. Some of the results undermine some national pretensions. In France Britney Spears is number 8 in the overall most popular query list. This from a country that prides itself in its disdain for American popular culture? France also ranked Vaness Demoury (who I'd never heard of before) ahead of Alyssa Milano. France also has that quintessential pop culture figure Pamela Anderson on their top celebrity list. Though so did the world as a whole.

The most curious thing about all the lists is that the top 20 gaining query list has no contemporary political topic. Sports, celebrities, cartoon characters, and video games are the most popular entries. Also, World Cup beat out Iraq as top news story. "Las Ketchup" (whatever that is - I want to remain in blissful ignorance) is considered to be a news story but it sounds like a pop culture fad. Also why is Canada in second place on the popular destinations list?

By Randall Parker    2002 December 16 11:28 AM   Entry Permalink | Comments (0)
Email Viruses Growing In Incidence

At least this isn't as big of a problem as Spam. The most successful is Klez.H followed by Yaha.E.

During 2002, one in every 212 emails passing through the company's filtering system was a virus. This is nearly double the rate of one in every 380 recorded for 2001. And in 2000 the ratio was one in every 790 email messages.

By Randall Parker    2002 December 16 09:46 AM   Entry Permalink | Comments (0)
2002 December 15 Sunday
Mozilla v1.3a is Released

Mozillazine has links to various places to download it. You might be asking how could 1.3a come out so soon after 1.2. Well, 1.2 branched off the main trunk a couple of months ago (August? I forget) and lots of stuff has been getting added to the main trunk since it did. So when 1.3 comes out as an alpha it really has a lot more feature changes to it than you'd expect given the time interval between 1.2 and 1.3a. I haven't downloaded it yet. The release notes don't have anything particularly exciting in them (at least to me - though if you use Moz mail and news you might feel otherwise) and so I'll probably wait till 1.3b before trying out 1.3 builds.

By Randall Parker    2002 December 15 03:50 PM   Entry Permalink | Comments (0)
2002 December 13 Friday
Google Labs With Neat Scrolling Preview Mode

I saw this Google Labs URL in my referral logs. If it is still working then try this.

If you don't want it to scroll as quickly then try this.

By Randall Parker    2002 December 13 10:40 AM   Entry Permalink | Comments (0)
2002 December 12 Thursday
Google's Frugal Froogle Shopping Searcher

Those Google people are at it again. Want to search for deals on stuff you want to buy? Try Froogle.

About Froogle.

Found the link posted by Razib on Gene Expression blog.

By Randall Parker    2002 December 12 10:05 PM   Entry Permalink | Comments (0)
Will Microsoft Buy Borland?

I like C++ Builder. I like the fact that it is now available on Linux (with a different name) even though I haven't had occasion to use it on Linux as of yet. I use JBuilder and like it as well. Does Microsoft like any of these things? Of course not. MS isn't keen to see good cross-platform development tools improve and grow in popularity. MS wants platform lock-in. Well, IBM is trying to buy Rational for its high-end application modelling tool. For that reason MS (which uses Rational's tool) might buy Borland in order to get the competing product that Borland's TogetherSoft subsidiary make .

Borland Software shares climbed Thursday as analysts considered whether the software company could be a Microsoft acquisition target.

So the fighting of the gorillas over app modelling tools might lead to Borland falling under Microsoft's control. This could easily make JBuilder, C++ Builder, Delphi, and assorted other Borland tools into roadkill. Bummer dudes.

By Randall Parker    2002 December 12 08:28 PM   Entry Permalink | Comments (2)
2002 December 08 Sunday
Phoenix Browser v0.5 Is Released

The v0.5 Naples rev of the Phoenix browser is released. If you are upgrading pay very careful attention to the release notes:

PLEASE NOTE: You should create a new profile for Phoenix 0.5. To create a new profile, start Phoenix by running phoenix.exe -ProfileManager and click on the "Create Profile" button. If you don't want to delete your old profile and are willing to incur the risk of new bugs, you should at least delete your profile's downloads.rdf file. You must also delete your old Phoenix directory rather than just overwriting the files there. Not doing so WILL result in problems and you should not file any bugs on Phoenix unless you've first done a clean install and tested on a new profile. As Phoenix stabilizes more this will not be necessary but until then these steps are absolutely necessary.

The release notes page above has links to downloadable files for Windows and Linux. Also, Mozillazine points to another location to download them.

What I do right after installing and starting Phoenix for the first time: Choose View | Toolbars | Customize. Then drag the Go button up to the right of the URL control. This gives you the Go button that all browsers normally have. Then click Done.

The next thing to do is to get a better theme for the appearance of Phoenix. You can choose to directly install the theme. Once it has completed you need to go into Tools | Preferences | Themes and Extensions. Your new theme should show up on the list. Choose it and click Okay and Phoenix should change to that theme. Mozilla has to be restarted to make a new theme take effect but with Phoenix it happens instantly. If you download a theme as a Jar and the choice you use does not install it into your browser then you can use this form to automate the install process.

Note that you can import Mozilla bookmarks from the bookmarks.html file in the bookmarks manager. Or you can copy the Mozila (or Netscape) bookmarks.html file on top of the bookmarks.html for the Phoenix profile. Look for these files in your OS install drive on Windows.

Note that the Phoenix name is going to be changed. It is surprising to me that they released v0.5 without first changing the name.

Update: Anyone know any OS/2 users? I know they exist because I see them occasionally in my Parapundit.com site web logs. Haven't seen an OS/2 entry on TechiePundit for a while though. But get this: There is now an OS/2 Phoenix build downloadable from the release notes page at the top of this post (it wasn't on that page originally). If anyone knows any rare endangered species OS/2 users please tell them to go download the build for Phoenix v0.5. It would be nice if whoever is taking the time to make OS/2 builds would get to have his builds used by real OS/2 users.

By Randall Parker    2002 December 08 02:00 PM   Entry Permalink | Comments (2)
2002 December 06 Friday
Techniques For Blocking Spam Email Reviewed

Karl A. Krueger has written an interesting article reviewing various spam fighting techniques entitled The Spam Battle 2002: A Tactical Update.

Vernon Schryver's DCC: Measuring Bulkiness

DCC, short for Distributed Checksum Clearinghouse, is a client/server system for the detection of bulk mail. (Schryver) A DCC client is usually an SMTP server, though it may also be a mail user agent (MUA -- a mail client). Whenever it receives a message, it calculates several checksums of that message, and transmits them to a server, which returns the number of times it has seen each of those checksums. If a message has been seen many times by DCC clients, these numbers will be high, indicating that the message is likely bulk mail. DCC servers can also exchange checksums with one another, forming a redundant server-network similar in structure to that of IRC.

As the above description should make clear, DCC does not attempt to judge whether a message is spam. Vernon Schryver, the system's creator, believes that it is not feasible for an unintelligent system to accurately discern whether a particular message is spam. What DCC judges is the "bulkiness" of the message -- how many copies of it have been transmitted. As a result, clients which reject mail on this basis must also maintain a whitelist of non-spam bulk mail senders, such as legitimate mailing lists. This imposes some overhead on DCC users, but presumably not as much as maintaining a local blacklist of every spam source.

The checksums that DCC uses are not the same kind of checksums used by cryptographic algorithms. A crypto checksum or message digest is designed to maximize the output change caused by a small input change. Since spammers usually add changing elements such as tracking numbers to spam messages, such a checksum would not work for spam. Instead, the DCC checksums are fuzzy checksums under which such small input changes do not change the output. These work by checksumming not the bits of the message, but the arrangement of meaningful elements such as letters and URLs.

The New Scientist reports on a new technique for fighting spam developed by AT&T researcher John Ioannidis. It involves the use of special encrypted email addresses.

The Single Purpose addresses consist of a few dozen characters before the @ sign. The reply conditions are encoded using a secret cryptographic key, so that a spammer cannot create fake addresses. The addresses might look like nonsense but could easily be processed by computers, Ioannidis says. They could be posted to the web or used to subscribe to a mailing list without fear of receiving a barrage of spam in return. A much simpler "unlimited use" address would kept for personal correspondence, he says.

This article really doesn't explain how this technique works. Does the sender make a public key available for reading the address so that receivers can know who it is from and that it really is a valid originating address? Does each receiver need to know the public decrypting key of each sender he gets email from? Or are the keys shared at the level of POP servers?

Is the purpose to allow only each receiver to be able to reply to a given sender with the customized response address? I don't think so. Or is the purpose to allow receivers to know that the original sender is really who he says he is?

By Randall Parker    2002 December 06 09:36 AM   Entry Permalink | Comments (0)
2002 December 05 Thursday
Gartner Says PC Upgrade Cycle Now 4 Years

Machines are now fast enough for most uses.

Today's powerful PCs can run Microsoft's latest versions of its Windows operating system or other office software. Users tend to demand faster Internet connections, not faster microchips. Add to this corporate cost cutting in a weak and uncertain global economy and Gartner has changed its assumptions on replacement cycles to four years.

By Randall Parker    2002 December 05 09:54 AM   Entry Permalink | Comments (0)
2002 December 01 Sunday
Google Zeitgeist, The Borg Mind, AI Blog Assistant

The Google Zeitgeist page shows what queries are moving up and down in popularity generally and in assorted categories. Periodically tune in to this page to watch one aspect of the changing thinking of the world's collection of minds. This ability of humans throughout the world to search on and find many of the same articles about any given topic ought to contribute to developing more commonality of outlook in heretofore fairly isolated sub-groups. Though just as significant differences of opinion remain in closely connected societies due to divergent personal interests, experiences, and innate personality characteristics so there will remain differences between groups around the world.

The New York Times reports, not surprisingly, that sex is a recurringly popular topic for searches. But Google also detects important events just after they happen:

On Feb. 28, 2001, for example, an earthquake began near Seattle at 10:54 a.m. local time. Within two minutes, earthquake-related searches jumped to 250 a minute from almost none, with a concentration in the Pacific Northwest. On Sept. 11, searches for the World Trade Center, Pentagon and CNN shot up immediately after the attacks. Over the next few days, Nostradamus became the top search query, fueled by a rumor that Nostradamus had predicted the trade center's destruction.

This ability to detect unfolding events might have a use in bioterrorism attack. There are plans afoot to automate the collection of data about symptom reports for doctors' office visits and pharmacy drug sales in order to detect a bioweapons attack before any of the victims are properly diagnosed. Well, if there are patterns of Google searches that people make for health information when family members come down with various categories of symptoms then the combination of originating IP addresses (since IP addresses usually can be assigned to geographic areas - though perhaps that isn't true for all ISPs) and disease information searches could be tracked as another way to detect the early stages of symptoms from a bioweapons attack.

I recently read the assertion (by John Derbyshire who also once again pointed to the important role played by Google) that cultural changes happen later in Canada than in the US. Well, Britney Spears is at the top of the Canadian search list at the moment and yet Spears has peaked in popularity on Google as a whole. It would be interesting to see the popularity of Spears and other major celebrities tracked by nation to see which nations jump on new celebrity icons the fastest and slowest. It would also be interesting to know whether local favoritism makes someone like Avril Lavigne a bigger search topic in Canada than in other developed English language countries and ditto for other artists that come from lower population countries who make it big.

Writing on Slate Michael Kinsley sees Google starting to do some of the functions historically done by editors

Google concedes that its choices of stories and news sources are "occasionally unusual and contradictory" but insists with uncharacteristic pomposity, "it is exactly this variety that makes Google News a valuable source of information on the important issues of the day."

Which is humbug. People still do it better. But not by much. The day is clearly approaching when editors can be replaced by computers. This requires some urgent rethinking.

He's writing somewhat tongue-in-cheek here in terms of his fears that editors and other mental workers will be increasingly replaced by computers in an increasing number of categories. But its actually true. In some cases the computers will automate just part of a mental worker's job. Take blogging for example. I bet a neural net with some additional other types of algorithms could do a decent job of doing some of the job of article selection that a web logger performs. The history of what a popular web logger posts (eg Glenn Reynolds of Instapundit) could be used to help make search queries to identify articles to post about. Google News could be searched for patterns that match the posting history of a successful blogger (said posting history would be analyzed by software perhaps using Bayesian algorithms of some sort). Also, other blogs could be watched for breaking interest stories by use of Daypop.com and MIT Blogdex. Daypop and Blogdex are already serving the function of meta-weblogs.

Of course bloggers also provide commentary and select portions of articles to excerpt. Until full artificial intelligence is achieved the earlier versions of the Blog Assistant AI software I envision could provide a list of proposed articles to blog about and a real human blogger could select from this list. The Blog Assistant could even select a proposed excerpt to use for the blog post. The blogger then accept or overrule the Blog Assistant choice. The Blog Assistant could bea learning system that gradually refines its algorithms based on choices that the blogger makes while using the Blog Assistant.

Of course, a Blog Assistant would be a lot smarter if it could somehow know what readers are thinking about. A really popular blogger (not me) gets a lot of e-mail from readers. A Blog Assistant could read the e-mail and look for patterns of reader interest. That Blog Assistant could even look at articles when the readers send links to articles and then propose to the human Blogger that particular articles submitted by readers match the blogger's interests. Also, Google search engine patterns for people who come to the blog site could be tracked and the Blog Assistant could make suggestions for popular topics to write about. Similarly, the Blog Assistant could track which posted articles get the most views as links to just those posts and then again adjust its preferences for which new articles the blogger should post about.

By Randall Parker    2002 December 01 12:08 PM   Entry Permalink | Comments (0)
Site Traffic Info