User talk:Alvonruff/Archive01

From ISFDB

Jump to: navigation, search

Contents

Tk?

Al,

Is it a safe bet that everything at http://www.isfdb.org/cgi-bin/ea.cgi?Tk is a "graphic novel" of some sort? Some of them state so outright while others are not so clear. Ahasuerus 15:36, 1 May 2006 (CDT)

Yeah. Tk is bubbling up to the top of my list of "authors" to address (from my list of popular authors without a Wikipedia article). Tk is really the puplisher.


Backups

Sheesh, these marathon design discussions make me feel like we are trapped in Ken Grimwood's _Replay_ and it's 1995 all over again! Which isn't necessarily a bad thing :-) And, um, looking at the date of the backup files posted here, I am afraid to ask when the last real backup was performed?..Ahasuerus 22:00, 1 May 2006 (CDT)

Backups happen nightly. However, that isn't the version that's put online for download. I bring the backup over to the home unit, and remove the wiki tables (need to remove passwords and email addresses). Then a new backup is made from that version and uploaded. Real backups are still around at TAMU and my house. There are those people who build things other than an ISFDB from the database backup, so I'm holding back on putting up a new version to reduce the amount of work they need to do as I redesign tables. Alvonruff 04:39, 2 May 2006 (CDT)
Password/email data removal is certainly very important, but I hope it could be automated in the foreseeable future to eliminate the human factor? As for the redesign issue, do you think we could have two copies of the backups made publicly available, "old but stable" as well as "current but unstable"? My real concern is data preservation in case of (a) catastrophic failure and/or (b) backup software/human error. I have seen businesses where the operations department thought that their backups were fine, but they either never tried restoring and testing them, or they did it so rarely that they didn't catch the point where the backups stopped doing their job -- with predictably disastrous results. Ahasuerus 09:53, 2 May 2006 (CDT)

Okay, here's the backup situation in detail:

  1. TAMU does a MySQL backup every night. This backup goes into a standard location. TAMU sysadmins and myself have access to this backup. Access is not available via web browser, and requires a WebDav account and password. Someday I'll share that with other trusted users.
  2. Everynight, an automated chron job retrieves that backup and transfers it to my home Linux system.
  3. The script loads the backup into my home MySQL system, deletes sensitive tables, and then makes a second sanitized backup. I now have two backups on my system: the original prestine backup from TAMU, and the sanitized version.
  4. I then reload the original pristine version into my system, so I have a working copy of the pristine backup.
  5. I take automated nightly backups of my database as well, and keep 7 days worth online.

The missing elements are:

  1. I don't automatically upload the sanitized version to make it available for download yet. Doing this nightly will be... nice, but it won't prevent a large amount of data loss in case of catastrophic error.
  2. The pristine version is not available to anyone other than TAMU sysadmins and myself.
  3. I haven't made physical backups of the data and stored them offsite (well, not in a while).

Currently a catastophic failure would require three events in tandem:

  1. Loss of physical backups from TAMU (a possibility as you state, although the ISFDB backup file available online is derived from this TAMU backup), and
  2. Loss of the disk drive at TAMU where ISFDB data and backup are stored, and
  3. Loss of the home system
I have seen systems with triple redundancy go down even though it was "impossible" for them to fail, so yes, I am paranoid and would like to see at least 3+ copies stored in various remote locations -- eventually. For now, an automated nightly backup that would copy the zipped database file to a remote location should be fairly easy to do (one would hope) since we are talking about <100Mb. Once the file is there, it can be backed up to CD/DVD/flash/etc. Ahasuerus 14:09, 2 May 2006 (CDT)

I think the real risk is that trusted elements of the ISFDB community don't have access to the complete set of data. Without all of the data, some aspects of the ISFDB will be nonsense (like change history). To rectify that, we would need to create a third host, available to specified parties where a complete copy of the database would be stored nightly. Alvonruff 13:00, 2 May 2006 (CDT)

I assume that these trusted parties need change history data to be able to do incremental uploads-cum-data mining of ISFDB data into their systems? If the master database file(s) are released to these trusted parties, will the password data be strongly encrypted? After all, trusted parties may have families, friends, unsecured networks, etc etc. Ahasuerus 14:09, 2 May 2006 (CDT)

Lessons learned so far

I have attempted high level cleanup of a few author's data just to see what kinds of problems a typical editor may run into. Here are my thoughts so far:

  1. ISFDB's submission/editing tools are limited but reasonably stable for a beta system. They let you recover from snafus fairly gracefully. I am yet to experience major data loss or degradation after a modest amount of system abuse.
  2. On the other hand, the editing process is quite time consuming. A drag and drop interface might speed things up substantially, but I doubt we can do it with an all volunteer crew. For now, making Moderators' submissions auto-approved might help things along. A form that would allow you to delete a Work and its associated Publication (only one for now, auto-deleting multiple Publications with one click may be more trouble than it's worth) would be handy.
  3. Much of the publication and work level data that we have is quite dirty, in part because much of it comes from Amazon and related places and their data is, well, dirty. It will take a long long time to get all of the data verified, probably man-years.
  4. Some form of data verification matrix as described elsewhere will be a must. Ahasuerus 20:25, 2 May 2006 (CDT)

First occurrence of data degradation

Well, it had to happen at some point. Please see the last reported bug. Ahasuerus 00:15, 3 May 2006 (CDT)

P.S. Would it be safe to continue with editing while this record is broken? Or could it cause further database degradation? Ahasuerus 10:16, 3 May 2006 (CDT)
The only thing broken is the reference to that particular title, so you're pretty safe. If you're worried, just avoid that particular author.
Ah, thanks, good to know!
If you need to delete an author from a record, for now make sure that there are no empty lines before other authors (for instance cut the last author and past over the deleted author).
Yes, that's pretty much what I have been doing and haven't run into any problems so far.
I may not be able to repair it from my current location.
Not a problem, there are a few other things in the database that I can work on aside from this record :) Ahasuerus 20:03, 3 May 2006 (CDT)

Sysop discussion page?

Is there a hidden Wiki page where SysOps can discuss policies and operational issues like what kinds of user names should be allowed (is, e.g., "Mutherfooker") and what accounts need to be blocked?

A most peculiar case

http://isfdb.tamu.edu/wiki/index.php/Author:Errick_Rios -- hm...

Yeah. I didn't think we'd have many neutral POV wars here, since we've removed most of the forums for that type of discussion. In this case, not exactly a "bibliographic note" either. On IMDB, there are forums for these kinds of discussions (which are de rigueur elsewhere as well), but my impression is that they are extremely high maintenance. Ultimately, this kind of note shouldn't go on that particular page - the decision to make is whether or not discussion pages should exist. Alvonruff 06:11, 9 May 2006 (CDT)
I was thinking the same thing. Substantive (as opposed to bibliographical) comments are high maintenance, flame-inducive and likely to hinder the kind of methodical work we are trying to do here. I rather like the WP approach, i.e. zap any Talk comments that are not related to editing the parent article. Perhaps we could have something in the Help pages that would point those who want to discuss SF to news:rec.art.sf.written? Or even a URL or two along the lines of Discuss this book/author on the "rec.arts.sf.written" discussion group via Usenet or Google Groups on the Work's/Author's page? Ahasuerus 09:17, 9 May 2006 (CDT)
Author:Rob Gerrand has been created by User:Roge56. Ahasuerus 21:03, 9 May 2006 (CDT)
Saw it. Should we play Police State or Social Experiment? Alvonruff 21:33, 9 May 2006 (CDT)
Well, as long as all of these, er, unorthodox submissions are covered by the same copyright-free license, I suppose we can "let a hundred flowers bloom" (TM by Mao Inc.) and then move them to Wikipedia if/when it becomes necessary/advisable. Ahasuerus 21:38, 9 May 2006 (CDT)
P.S. We probably want a standard greeting that we could cut-and-paste on new users' Talk Pages explaining what the ISFDB is (and is not) and what its Wiki is for. I'll see if I can whip something up shortly. Ahasuerus 08:18, 10 May 2006 (CDT)
Even more visitors at Author:Lori C. Schneider. Ahasuerus 23:20, 18 May 2006 (CDT)

Data verification

A proposed approach to capturing verification data within the database (and some thoughts on Series-related complications) have been posted in Bibliographic Rules. Do you think it will be doable/in the ballpark? Or am I barking up the wrong tree?

This looks like the right tree. Appears pretty easy to implement.

Also, what is the limiting factor when it comes to implementing software fixes/adding new features? Is it that you are the only developer on the project and you have to eat/sleep/work sometimes? If so, I guess I was pretty good with the abacus back in the day, and I am sure these mysterious "MySQL" and "Python" can't possibly be that much more difficult to learn. Admittedly I don't like snakes much... Ahasuerus 11:16, 9 May 2006 (CDT)

I think a Pareto chart will show that the eating and sleeping profiles are inconsequential when compared to the working profile. Being on the road five out of the last six working days was a bit unusual. It'll probably be quicker for me to do bug fixes and complete the current variants / pseudonyms / content stuff, but you could certainly start in on newer features like verification (it will require one new SQL table, which is no biggie). Alvonruff 20:32, 9 May 2006 (CDT)
First I would have to download Python and MySQL and see how long it would take me to get up to speed. I gather things have changed a tad since 1994-1995 when I was maintaining the HTML version of the rasfw (and related) FAQs with a text editor. With luck, I should be able to take a closer look at the reptile in the next few days. Ahasuerus 21:34, 9 May 2006 (CDT)

Proposed blocking policy

ISFDB:Policy updated with a proposed "blocking policy". It is a little tougher on vandals than the current WP policy. Ahasuerus 12:38, 10 May 2006 (CDT)

I have no problem with those. I've been implementing swift justice on spammers for a while now. I've enabled whitelisting, so banning should be done on a user name basis, instead of an IP basis. There are also some regular expression filters that prevent the submission of URL lists that are structured in a specific manner. User:Alvonruff
Well, the filter didn't catch the latest spammer, but Recent changes did and I zapped him manually. Ahasuerus
The filter prevents a spammer from saving articles that follow a common spammer pattern (not described here for obvious reasons). Once confronted with a difficulty, they revert to more primitive listings that are immediately obvious. I think that there are some MediaWiki rules that prevent more than n URLs in a post as well, but I haven't seen the specifics yet. Alvonruff 19:43, 15 May 2006 (CDT)

Multiple Authors bug -- IE-specific

In unrelated news, is there any chance that the submission of Works/Publications with multiple Authors might be fixed in the foreseeable future? The problem makes it impossible to enter complete bibliographies for most Authors and I am hesitant to enter partial Work entries that omit co-authors. Ahasuerus 10:19, 12 May 2006 (CDT)

Okay; took me a while to reproduce. It's a bug only with Internet Explorer, which is why I wasn't seeing it. Obviously related to Javascript handling in IE. I'll look at that this evening. User:Alvonruff
Well, IE is not exactly the cuddliest browser out there, but it's the one that most users use, so that's what I test with. A perfectly designed and implemented user interface may be esthetically pleasing, but it doesn't help the users if 80%+ of them can't use it :) BTW, do you have a breakdown of incoming requests by browser type, by any chance? Ahasuerus 12:56, 12 May 2006 (CDT)
  • 69% - Internet Explorer
  • 23% - Firefox
  • 4% - Safari
  • 2% - Opera
  • Loose change - Netscape, Mozilla, Konqueror, Camino, Mozilla Compatible, Galeon.
  • Alvonruff 13:50, 12 May 2006 (CDT)
Sheesh! Either our users are very advanced (fans are slans, after all) or a lot of them live in Finland :) Ahasuerus 14:09, 12 May 2006 (CDT)

Response time

isfdb.tamu.edu has been intermittently slow lately :( Do we happen to know where the problem might lie? The Web server, their internet connection or the ISFDB innards? Ahasuerus 21:10, 13 May 2006 (CDT)

Too many variables; too hard to say. The innards didn't change recently, but could be the server, DNS server, or Google Analytics. Alvonruff 17:59, 15 May 2006 (CDT)

Grendelkhan

Sure, I remember him from his comments on the forum a little while back. The more the merrier :) Ahasuerus 19:56, 15 May 2006 (CDT)

Thanks! grendel|khan 16:01, 16 May 2006 (CDT)

Project Gutenberg links

I'm sure after the database redesign, this is the very last thing you'd want to think about, but a number of SF works have been cleared recently over at Project Gutenberg. See, for instance, the works of Andre Norton. (Okay, it's just one, but there are like three more books on the way.) Would it be worth it to include a field to link a title to one or more PG etexts (sometimes a text is also published from a different edition, or in a different format)? The author records have a freeform URL field, which can be used to link to a PG author record; it might not be worth the trouble to specifically add a field for Project Gutenberg link, especially as a comparatively tiny proportion of works in the ISFDB have fallen into the public domain. I wanted to run this by you before I started adding PG links under "Web Page 2" or whatnot. grendel|khan 12:50, 16 May 2006 (CDT)

We should be able to support a generalized set of web publication links per title. This would include not just Project Gutenberg etexts, but other *freely* available etexts (such as when Hugo nominees are made available online, or when Cory Doctorow makes his stuff available under a Creative Commons license). Rather than making these author links, I'd prefer to make them title links. I'll move this over to the feature roadmap section later. Alvonruff 14:54, 16 May 2006 (CDT)
Sounds reasonable. BTW, there is a portal that collects links to complete works of SF that are available on line. Ahasuerus 15:09, 16 May 2006 (CDT)

Wikipedia discussion of the ISFDB entry -- notability concerns

See ISFDB:Community Portal and the ISFDB's Talk page on WP. BTW, now that we have 3+ active contributors, we may want to start moving general interest discussions from userpages to more generic forums. Ahasuerus 15:06, 16 May 2006 (CDT)

Something wrong with this title.

Titles 112651 and 112652 are two parts of a novella "To the Stars" by L. Ron Hubbard, title 92605, published in serial form. However, they're also showing up in every other work titled "To the Stars", which I see in works by Robert Silverberg (34767), Harry Harrison (39531), J. T. McIntosh (59342), James Spencer (106566), Arthur C. Clarke (119015), Lee Owens (130669), Robert Heinlein (174800) and, of course, L. Ron Hubbard himself. I don't see any way of removing those two serial titles from being listed in these other titles as serialized versions. I can't tell if this is a bug or just data being entered wrong, so I'm entering it only here for now.

To quote the same ISFDB_Feature_List:
10/29/2006 - Review Serial support. Linkage to titles is still performed lexically.
In other words, all Works with "To the Stars" as their title are currently linked to this serial :( Ahasuerus 19:21, 16 May 2006 (CDT)

Also, I don't see any tools to edit omnibus collections (this originally started when I was going to edit the contents of Heinlein's omnibus "To the Stars" (174800); I take it that the proper way to do this is to mark both the title and the publication as type OMNIBUS and then edit the contents of the publication to point to the titles included in it (as in, for instance, A Heinlein Trio)? And that the content-editing tools are currently unavailable? Ah, I see. It should be online in a week or two. grendel|khan 19:12, 16 May 2006 (CDT)

Alternate titles on publications possibly give incorrect linkage.

I added the Croatian translation of "A Fire Upon the Deep" (publication VTRNDDBNMF2002), but the title link ("Title Reference" on the publication page) leads back to the record for the cover art. Usually the title reference, even if there is cover art, points back to the title record for "A Fire Upon the Deep". (See publication BKTG18990.) Did I do something backwards? grendel|khan 00:04, 17 May 2006 (CDT)

Definitely a thing that makes you go "Hmmmmmm". Looking through the data trail:
  • The data submission has the correct parent number, so you didn't do anything wrong.
  • The integration worked correctly.
  • The pub mapping table has two entries in it: one for the title and one for the cover art, which is the way it should be.
  • Oh. The pl.cgi app currently uses the first reference it runs into, which just happens to be the cover art information.
Not the desired result, but there are no data consistancy errors - it's just being displayed wrong. It probably happens elsewhere (like in anthologies and collections), we just haven't noticed that one yet. More than a one-line fix, so I'll put it in the queue. Alvonruff 05:51, 17 May 2006 (CDT)

Pseudonym support

I saw that enhanced pseudonym support was now listed as "DONE", so I tried merging "Steven Swiniarski"'s books into "S. Andrew Swann"'s bibliography as well as adding the books that he wrote as "Steven Krane". I can see that some books are showing up "as by XYZ", but I seem to be unable to figure out how to enter this information :( Ahasuerus 16:06, 20 May 2006 (CDT)

There's currently no method for doing the wholesale merging of one author as a pseudonym of another. That's fairly complex (multiple works, multiplied by finding parent versions of each title) but I'm looking into it. On a per-book basis, there are two methods:
  • If the pseudonym work already exists, then go to the title page for that work. Select Make Title Variant - this will make the title a variant title. If the work by the real author already exists, put the title record of the parent in the top part of the form. If the work by the real author does not exist, put the title data in the bottom part of the form.
  • If the pseudonym work doesn't exist, but the parent does, then go to the parent title. Then select Add Variant Title. Enter the pseudonym version here. User:Alvonruff
Oh, I see, I missed the new choices in the navbar! However, although making Swiniarski's "The Flesh, the Blood and the Fire" a variant title under Swann did work at the Work level, its related Publication record wasn't moved. It now shows up as a Stray. Did I do something wrong during the submission process or did the process of associating this Publication with the new Work didn't work? Ahasuerus 17:28, 20 May 2006 (CDT)
Yeah. Don't touch anything. It's technically correct (the pub is still connected to the variant) but not the desired effect. The title needs to show the publications of any variant childen as well as pubs connected to the parent canonical title - which I thought I did. I'll get back to you on that... Alvonruff 17:38, 20 May 2006 (CDT)
Everything is correct except for the leftover stray. How did you create the variant? On the home system, I did a make variant on the Swiniarski version, and punched in the data to create the Swann parent. I didn't get a stray that way.
Okay. I know what's going on. Swiniarski doesn't have any titles, but he does have pubs. That's not good - *unless* the author is a pseudonym. But the database doesn't know about the Swiniarski <=> Swann relationship (I'm not seeing it at home, because I didn't link all of the Swiniarski titles). We also want the Swiniarski page to refer to the Swann page (see Robin Hobb) so that when people look up Swiniarski, they'll know to go to Swann. I'll create a pseudonym manager (which we need anyway), so that the relationship can be established. (It only has to be done once per pair, and I don't think it's a good idea to automate the detection of those relationships, or not have a way to delete mistaken pairings). Remember - pseudonyms are hard - we just have to make it seem easy... eventually. Alvonruff 18:07, 20 May 2006 (CDT)
Another mystery solved! :-) As far "pseudonyms are hard" goes, I am currently thinking about it. Perhaps there is a way to simplify them, but I'll have to review the current table layout first. Ahasuerus 18:13, 20 May 2006 (CDT)

Comic books by Bill Willingham?

Before I aim my flamethrower at all of these entries, would you say that all of them deserve to die a fiery death like the filthy rotten comic books that they are? And did they slip past Dissembler's safeguards or did they get imported before the anti-comic algorithms alluded to elsewhere were put in place? Ahasuerus 23:08, 20 May 2006 (CDT)

The current Dissembler algorithm uses publisher names and page counts to find comics. These are offenders that went in before those heuristics, which would have caught all but two:
  • Fables, Storybook Love (2004) - General, 192p.
  • Robin: Unmasked (2004) - General, 128p.
I'm not tracking General as a publisher (and that's probably bogus, Robin is a DC comic property), and the page counts are pretty large, but they are graphic novels. I'll see if I can't pick up some more subject information. At any rate, they are all toastable. Alvonruff 05:47, 21 May 2006 (CDT)
General seems to be a comic book imprint specializing in reprint editions, which tend to have inflated page counts, e.g., Supergirl - The Archives, Volume 2. Also, Showcase Presents: Superman Family, Vol. 1 (Superman (Graphic Novels)) is by DC Universe and from 2006; shouldn't Dissembler be able ot catch it? Ahasuerus 13:34, 21 May 2006 (CDT)
Since ISFDB covers Neil Gaiman's Sandman graphic novels, why not Bill Willingham's Fables? I'd say they're similarly serious works. (BW has also written for The Dreaming, which is a sequel/spinoff series from Sandman.) Vertigo is DC's mature-reader line, not your childhood comic books. --SAJordan 20:15, 7 Oct 2006 (CDT)
As a general rule, trying to select graphic novels (or most anything else) for inclusion based on whether they are "serious works" or not typically ends up in a never ending discussion of what is and is not serious :( The main reason not to include graphic novels and comic books in the ISFDB is simply our current lack of support for comics-specific information. Unfortunately, although it used to be pretty easy to tell regular novels from comic books, things have changed lately and the line is not so clear cut any more. We are still scratching our collective head over this development... Ahasuerus 22:27, 7 Oct 2006 (CDT)

Editor recruitment

I have been thinking about trawling the History section of SF-related Wikipedia articles for promising editor/moderator material. Do you think Mike Christie, who has been doing a fine job of organizing Ace-related materials, would be a good first candidate? Ahasuerus 20:27, 21 May 2006 (CDT)

Wow. According to his contributions, he doesn't even get distracted by Russian history :)
Hey! Leon Trotsky just called -- he objects to being called a distraction! :) But yes, my databank dedicated to late 19th-mid 20th century Eastern/Central European history was one of the things that I hoped to upload to WP before I transcend. Unfortunately, even mildly controversial topics can become unmanageable timesinks on WP due to the way it's run, so my time is probably better spent here, at least until the place is cleaned up and running smoothly. Ahasuerus 22:27, 21 May 2006 (CDT)
Looks like a good candidate, if interested. Alvonruff 21:24, 21 May 2006 (CDT)
He is busy with other SFnal projects right now, but I'll give it a shot. Ahasuerus 22:27, 21 May 2006 (CDT)
I left a message on WP, we'll see how the target reacts :) By the way, if we are going to have more than a couple of people with sysop privileges and/or if more than one of us gets to play with the code, we may need to have a hierarchy of sorts. Some way of determining which software changes should go in, which bibliographic rules get implemented, etc. Ahasuerus 23:22, 21 May 2006 (CDT)

Hi; just got the message from Ahasuerus. Can you tell me a little more about what you would expect of me, both in the way of time and duties? I'm flattered to be asked, but I do have obligations to the OED SF project, and of course I am still tinkering over at Wikipedia, as well as having a real life and family that occasionally require my time, so I'm just trying to get a picture of what you're looking for in an editor. I'm certainly interested in the ISFDB and use it frequently, and have quite a few bibliographical materials and a fair-sized collection to help with validations. So I'm happy to help, but would just like to understand the role a bit better. Thanks. Mike Christie 06:10, 22 May 2006 (CDT)

The only real duty an editor has will be eventually to help moderate submissions by the general public. At present, only moderators are allowed to edit, as we bring more editing tools online, find bugs in the existing tools, and find the need for new tools. So the "official" obligations at present are:
  • Try entering data at your leisure. Most of us generally stumble into some area of focus: Grendelkhan is currently working on Heinlein, Ahasuerus is furiously *deleting* content, and I'm mostly testing new editing apps and webbots (which furiously adds content). We all fix errors as we come across them.
  • Document confusion you may have in the wiki, so that we can alter the tools or improve/create the online help.
  • Document ideas for improvements that will aid the editing process or make the bibliographies clearer.
  • Police the wiki for spam and delete mercilessly.
  • Create ideas on formalizing and documenting the bibliographic process, such that people can understand the relative maturity of a particular bibliography.
Other than that, there are no expectations, especially not on time. Any help you can provide will be appreciated. Alvonruff 06:34, 22 May 2006 (CDT)
OK, I can sign up for that. Thanks. I'll take a look around before diving in; I'd guess my areas of interest will be bibliographic templates, magazines, and first edition documentation; plus maybe a few specific authors such as Le Guin, Robinson, Knight and a couple of others I have a particular interest in. Mike Christie 08:10, 22 May 2006 (CDT)
<fanfare>You are hereby knighted into the Order of ISFDB Sysops.</fanfare> Alvonruff 08:46, 22 May 2006 (CDT)

Questions about current status

I have looked around a little bit, and I want to check my understanding of where the project is right now. I tried to log out of the isfdb.org (not the wiki) and got an error (which I assume I should report to you), so I can't currently see those pages as they look to someone not logged in or not an editor.

However, here's what I'm assuming is the current state -- please let me know if I'm missing anything significant.

There are two components, the ISFDB, and a supporting Wiki. The ISFDB will allow a logged-in ordinary user to do editing; but that's not yet been generally released to the public. Any ISFDB login is presumably a login to the Wiki too? Which would explain where there are a hundred or more users with no activity on the Wiki. The Wiki is active and available but since its main function is to support the admins/sysops/moderators there is little value in it for most users, who are just going to post new data on the ISFDB itself. If they disagree with a deletion they might end up kvetching at one of us on our talk pages or elsewhere on the Wiki, but that's all they're likely to do.

How did the comics-related titles get in, by the way, if no other users can enter data yet?

Sorry if these questions seem obvious -- I'm relatively new to Wikipedia so the boundaries between underlying elements are not yet completely clear to me.

Thanks. I'll go grab a book or two and try entering some data now, and report back. I may not be very active for two or three days -- other obligations are looming this week. Mike Christie 23:05, 22 May 2006 (CDT)

ISBN format

It appears that ISBNs are stored without hyphens or spaces; is that universal or just the ones I've looked at? I would suggest that if possible we should not strip hyphens or spaces from the ISBN; they do have structure, after all. I don't actually know whether it's possible to reconstruct an ISBN that has no hyphens; I could see it might not be. Mike Christie 23:58, 22 May 2006 (CDT)

ISFDB1 stored the hyphens. There are two problems with hyphenated ISBNs:
1 - It makes searching via ISBN difficult. For example, before submitting a new publication, the webbot Dissembler first checks with the ISFDB to see if it already has a publication with that ISBN. If one exists, Dissembler skips that publication, reducing duplicates. In order to perform an SQL query match, the hyphens in the two ISBNs must line up precisely, or the database engine won't find a match. When the hyphens are removed, there is no such problem.
2 - There is some structure to the ISBNs, but it is not implemented consistantly. During the transition between ISFDB1 and ISFDB2, when hyphens were still present in the ISFDB, Dissembler had enormous heuristics which attempted to take an unhyphenated ISBN from - say Amazon.com - and reconstruct the hyphenated structure to help in ISBN matching. It became apparent that each publisher had their own standards for how they hyphenated their ISBN space, which could sometimes change even between different imprints that were owned by the same publisher. Here are the different hyphenation schemes used by various publishers and imprints that I knew about:
 X-XX-XX-XXXX-X
 X-XX-XXX-XXXX
 X-XX-XXXXX-X-X
 X-XX-XXXXXX-X
 X-XXX-XX-XXX-X
 X-XXX-XXX-XX-X
 X-XXX-XXX-XXX
 X-XXX-XXXX-X-X
 X-XXX-XXXX-XX
 X-XXX-XXXXX-X
 X-XXX-XXXXXX
 X-XXXX-XXX-X-X
 X-XXXX-XXXX-X
 X-XXXX-XXXXX
 X-XXXXX-XXX-X
 X-XXXXXX-XX-X
 X-XXXXXX-XXX
 X-XXXXXXX-X-X
 X-XXXXXXXX-X
 XX-XX-XXXXX-X
 XX-XXXX-XXX-X
 XX-XXXXX-XX-X
 XXX-XX-XXXX-X
 XXX-XXXX-XXX
 XXX-XXXXX-X-X
 XXX-XXXXXX-X
 XXXX-XXX-XX-X
 XXXX-XXXXX-X
 XXXXX-XXXX-X
 XXXXXX-XXX-X
 XXXXXXXXXX
When dealing with hand-written static pages, the standard seems to be hyphenated ISBNs (see the Locus index for example). For database-driven content, the standard seems to be unhyphenated ISBNs (see amazon.com, isbndb.com, abebooks.com). We're always open to change here, so if there is some compelling advantage to hyphenated ISBNs, then we should look and weigh against the disadvantages of hyphenated ISBNs. In this particular case, however, the choice was not arbitrary - I chose unhyphenated ISBNs because the hyphenated versions were causing a great deal of trouble and time.
Real-world example of ambiguity: ISBNs are handed out to publishers in blocks. For instance, St. Martin's owns two ISBN blocks: 0-312 and 0-812. These are also used by it's imprints like Tor. The 0-312 block is used for paperbacks and trades, while the 0-812 is used for hardcover books. Ace owns the 0-441 block. There is an ambiguous hyphenation case in ISBNs that begin with 0-099. Some examples:
   0-09-977150-0 (Red Fox)
   0-099-54971-9 (Wizards of the Coast)
   0-09968-301-6 (Legend)

But even Legend doesn't stick to its own hyphenation scheme:

   0-09968-301-6 (Legend)
   0-09-944870-X (Legend)
   0-099-44371-6 (Legend)

This leads me to believe that the hyphenated structure can be quite arbitrary, even when taken out to the 0-09944 isbn space (that only leaves 3 digits to differentiate the hyphen style). Alvonruff 05:42, 23 May 2006 (CDT)

I hadn't seen so much variation; I can understand now why the database doesn't store them. Thanks for the clarification. It's good to get this documented too; I'm sure I won't be the last person to ask this question. Mike Christie 08:06, 23 May 2006 (CDT)
While some publishers format their ISBNs inconsistently the majority of the books in people’s hands are correct. Probably the biggest exception among active publishers is Tor who always formats their ISBNs as “0-812-58035-4” when they should be using “0-8125-8035-4.”
I personally would like to see formatted ISBNs displayed as they are much easier on the eyes when trying to verify a book’s data. If hyphenated ISBNs are displayed by isfdb I'd recommend also showing the unformatted version so that someone searching for “0340837489” will find the publication. In the interest of breaking as little code as possible I'd continue to store the unformatted ISBN in the database and format/hyphenate as needed for display.
If there’s interest, I could volunteer the ISBN formatting code I did for another book site I contribute to. It’s table driven and follows the per-country rules defined at [www.isbn-international.org] and its sub-pages. At the moment the code is in C and I believe I did a version in VB. See [www.fantasticfiction.co.uk] where down at the bottom of the page ISBNs are shown as “ISBN: 0340837489 / 0-340-83748-9 (UK edition).” The “UK edition” note comes from the group code part of the ISBN. www.fantasticfiction.co.uk chose to go with “0-8125-8035-4” rather than putting a special case in the tables for Tor's “0-812-5*” but we are free to change the rules. :) Marc Kupper 13:07, 2 Nov 2006 (CST)

ER diagram for database schema?

Do you have an entity relationship diagram for the schema? If not, I think I'll draw one myself just to get it clearer in my head; if you have one I didn't want to duplicate work. Thanks. Mike Christie 17:34, 24 May 2006 (CDT)

The closest thing we have is this diagram.
This is exactly what I was looking for; very informative. Thanks. Mike Christie 22:29, 25 May 2006 (CDT)

The case of two Disappearing Vortex reviews

We have just lost the pointers from two Vertex reviews (April 1974 and June 1975) for [1] as I was reshuffling Publication records :( Ahasuerus 15:47, 12 Jun 2006 (CDT)

Most peculiar. All of the records are here, and the title/author matches. I hate this lexical matching crap. Who designed this monstrosity? Alvonruff 16:19, 12 Jun 2006 (CDT)
Fixed. Not a design issue. Coding error (original title stomped while looking up other data). Will probably fix some serial printing problems as well. Alvonruff 16:40, 12 Jun 2006 (CDT)
Oh, nice! Reviews always seemed to be a bit unstable, design limitations aside, but I could never put my finger on it. I'll keep an eye on them. Ahasuerus 16:48, 12 Jun 2006 (CDT)

Data questions

Al, I have been looking at the impact of merge and delete and trying to understand a couple of situations I've seen. Can you help?

  • I've seen a "stray publication" listed. What's the definition of a stray publication?
I am a poor substitute for Al, but if I can save him a few minutes, so much the better :) "Stray Publications" are Publication records that don't have Work records pointing to them. Normally, it's a sign that something is out of whack, but there is one case when a Stray Publication (as currently defined) is legitimate: a Publication that belongs to a Variant Title. Al was considering cleaning the code up so that they don't show up as strays, but I don't think he has gotten to it yet. Ahasuerus 19:13, 19 Jun 2006 (CDT)
Thanks! Do you happen to know the data situation that corresponds to "publication records that don't have work records pointing to them"? I think this means that the author named on the publication record doesn't lexically match the author referenced in the pub_authors record, but I'm not sure.Mike Christie 19:37, 19 Jun 2006 (CDT)
I don't think any major tables are currently linked lexically except reviews and serials . Works and Publications should be matched via "pub_content". Do you see two foreign keys there, pub_id and title_id? That should do the trick :) Ahasuerus 19:53, 19 Jun 2006 (CDT)
  • Could you tell me if these statements are true?
    • Canonical_author links a title to the canonical author name; this may be a pseudonym though.
True.
    • Each title should have exactly one record in canonical_author for each actual author of that title. If a collaboration is recorded under a single pseudonym (e.g. Lewis Padgett, or Eando Binder) there will be two records in canonical_author since there are really two authors. Conversely if a book apparently by two authors is really by one -- e.g. John Wyndham and Lucas Parkes for The Outward Urge -- then there will only be one record in canonical_authors.
First part true. There are a set of records for the real authors and a set of records for pseudonomous authors - the pseudonym information needs to be stored someplace.
    • Each pub should have exactly one record in pub_authors for each apparent author of that title; e.g. a collaboration will have two records in pub_authors, but if it's under a single pseudonym (e.g. Lewis Padgett) it will be a single record.
That is true. In fact, publication records record the published author name. So, unlike titles (which has both), pub records only has the one. In an earlier incarnation, publication records pointed exclusively to published names, and titles to actual names. That seemed more elegant to me, but it turned out that matching up the actual names to published names was a hard problem - so bibliographies pub listings were missing "as by" data.
    • When a pub is entered:
      • If an author matches lexically, a pub_authors record is created linking the pub to that author; if not, a new author record is created and a pub_authors record is created linking them.
True.
      • If a title (for the identified author) matches lexically, pub_contents records are created for that title and this pub. If not, a new title is created and pub_contents is loaded using that title.
Not True. Currently new publications generate new titles, even if they already exist. See our earlier discussion on automerging. I'm currently leaning to the following behavior: If a title (for the identified author) matches lexically, pub_contents records and title records are created for that title and this pub, AND a submission is created to merge the new title with it's lexical match. If there is no match, no such submission is made.
      • The contents of the pub are treated in the same way: match author, then title for that author.
Also not true. Same recommendation as above. Alvonruff 07:28, 20 Jun 2006 (CDT)

Thanks for any help. Mike Christie 18:22, 19 Jun 2006 (CDT)

Another question. Suppose you enter the following two publications: Test Title 1, by Test Author1; and Test Title 2, by Test Author2. This creates two pubs, two titles, and two authors, and two records in pub_authors and two in canonical_author (which is the title_author cross reference). I assume that's what it creates, anyway; and I assume that if the above are novels there is nothing in pub_content. If you now merge these authors, and select "only publication records" on the merge, only the pub_authors record is updated. Now the publication Test Title2 still has an author of Test Author2, but the pub_auths record links it to Test Author1. Test Author1 will display only Test Title 1, and Test Author 2 will still display Test Title2, since this is still the state of the cross-reference canonical_author records. Displaying Test Title2 shows a publication as Test Title2 -- but why? I'm not sure how pubs are selected for display under a title. I'd thought it might be lexically, where the title matches the pub title and the author matches from pub_author, but that can't be right because pub_author has been updated. So I'm baffled. Can you explain? Thanks. Mike Christie 19:37, 19 Jun 2006 (CDT)

Also, I'm curious to know why the merge authors function gives you the choice of merging only titles or only publications. I'm sure there is some scenario that calls for this capability; can you tell me what it is? Mike Christie 21:51, 19 Jun 2006 (CDT)


The pub_content table links titles to publications. So while it's true that each content title is linked to the pub via this table, it is also used to link the novel to the publication as well (if the publication is a container, then one of the things it would contain is a work which is a novel). The purpose of the "only publication records" was to preserve pseudonyms under the old setup. As things are different now, I propose we drop that aspect of merging. Alvonruff 07:28, 20 Jun 2006 (CDT)

Spam or meta-spam?

The strangest spam we have been subjected to so far... Ahasuerus 19:39, 12 Jul 2006 (CDT)

Welcome back!

Welcome back, Al! I hope you are well rested and ready to continue with bug extermination :)

I haven't done as much work as I wanted while you were gone, but I am trying to keep up with the "Upcoming Books" page and beginning to develop data cleansing scripts, starting with Project:Author Names Cleanup. I have also written a script to extract some data that we may want to add to the list of "computationally intensive" pages -- see Project:Static Pages for details.

Other than that, not much has happened while you were gone: more Author biblios cleaned up, more spam squelched, a couple minor Project pages added, a few bugs identified.

OCLC Fiction Finder has undergone a facelift and is now in phase 2 (3?) of their beta process, but still buggy. Fantastic Fiction, arguably the main alternative to the ISFDB, seems to have been doing well lately and becoming more and more comprehensive. Ideally, we would be able to swap data, but they are commercial, so I don't know how far they would be willing to go. NESFA folks, who are working on the next generation of their database, have stopped by to see if we could share again -- see elsewhere.

Also, it occurs to me that we may need to beef up the "Disclaimers" page before we open things up so that all contributors have a very clear understanding of the intellectual property ramifications of any contributions (especially plot outlines) which they may be making. Ahasuerus 14:13, 1 Oct 2006 (CDT)

Thanks. I think the break invigorated things, so I'm actually interested in doing some extensive work now.
It often does. The old saying about variety being the spice of life has been getting a fair amount of support from psychologists lately. Ahasuerus 19:00, 1 Oct 2006 (CDT)

THE REST HAS BEEN MOVED TO Community Portal. Ahasuerus 22:21, 1 Oct 2006 (CDT)

Checking in

Al, just checking in -- I've been out of the country, and working on other things for a bit. How are things progressing towards a beta? My time is limited but I'd like to keep up with what's going on, and contribute when I can. I have built a couple of biblios for Wikipedia (Damon Knight and John Campbell) and can work on correlating that data with ISFDB data when we get to beta test over here. Mike Christie 08:00, 17 Oct 2006 (CDT)

You've been missed. Things are moving a bit slow, as I've been on numerous trips my self (with more coming up), but I have started on the verification support. The last thing to do after that is to vet the current bug list, and separate out a list of "must fix" bugs that need to be fixed before we flip the switch. Alvonruff 05:11, 18 Oct 2006 (CDT)
I have my ISFDB wiki watchlist on my home tabs, so I will keep an eye out for progress, and certainly join in when it gets to bug-culling, beta-planning and so on. Thanks. Mike Christie 12:54, 18 Oct 2006 (CDT)
Welcome back! We are still betaing the software, finding new bugs and areas for discussion -- not that the last two activities will ever come to an end :) -- but things are a little more stable now. I have some concerns about the need for multiple submissions when entering certain books/magazines (pseudonyms, etc), but that's fodder for another page. Ahasuerus 15:15, 18 Oct 2006 (CDT)

Unititled Science Fiction Novel

Al, is this Publication and its Title -- see http://www.isfdb.org/cgi-bin/pl.cgi?NTTLDSCNCF2005 -- a recent Dissembler artifact or Amazon flakiness?

I was fairly ruthless this weekend about rejecting Amazon entries, and I recall integrating a titled version of Nova Swing. I only took the Dissembler run out to March where I began to see titles along these lines. Usually the ISBN is assigned by the publisher before the title is finalized, so if you go too far out with forthcoming books you wind up with entries like the one you showed (I've seen Locus do this as well). This entry was entered about 15,000 publications ago, so it's more likely to be an old Dissembler entry from about 6 months ago. Alvonruff 20:33, 24 Oct 2006 (CDT)
Ok, sounds reasonable. Also, this sequel to Light was originally supposed to be published in 09/2005, but was delayed until 11/06, which may help explain why it fell through the cracks. I have cleaned Harrison up a bit, although Viriconium omnibuses still need to be reconciled with The Locus Index to Science Fiction and Wikipedia. Ahasuerus 01:53, 25 Oct 2006 (CDT)

Also, how far did you get with Z39.50 crawling when (ISTR) you were playing with it a few years ago? I have been putting together the foundations of a Z39.50 spider in my plentiful spare time (to be used to populate Wiki pages and such) and it occurred to me that I may be duplicating what you have already done. Ahasuerus 17:53, 23 Oct 2006 (CDT)

I recall doing something at the beginning of the summer, but I consider it rudimentary. When time permits, I'm more likely to work on the fundamental core engine that could be used by any number of spiders. So I don't think there will be too much overlap. If you get yours running well enough, I'll show you the magic to doing remote submissions to the ISFDB. Alvonruff 20:33, 24 Oct 2006 (CDT)
After playing with Z39.50 some more and seeing how dirty the data is, I am rather disinclined to do any kind of automatic submissions to the database even with the kind of moderation tools that we now have :( However, I can think of a number of ways to use Z39.50 data to support our effort, mostly by auto-generating Wiki pages to be later reviewed by humans.
At this point I have a rudimentary Windows/VB/Perl-based (VB because it's the only YAZ-based tool that I could easily get to work under Windows; I'll also try Linux/Perl one of these days) Z39.50 mongrel^H^H^H^H^H module that can simultaneously query hundreds of Z39.50 targets and generate MARC dumps. For example, here is what I get back when I query a random target for "personal name=Zelazny":
host: breeze.gmu.edu:7090
databaseName: Voyager
Number of records returned: 18
setname: default

Record 1:
001 392623
008 870120s1986    nyu           000 1 eng d
035    $a (OCoLC)15088561
035    $9 ABX5280GM
040    $a PCB $c PCB $d m/c $d VGM
049    $a VGMM
090    $a PS3576.E43 $b B58
100 10 $a Zelazny, Roger. $w cn
245 10 $a Blood of amber / $c Roger Zelazny.
260    $a New York : $b Arbor House, $c c1986.
300    $a 182 p. ; $c 22 cm.

Record 2:
001 392431
008 910503s1991    nyua          000 1 eng  
010    $a    91018153 
020    $a 0553076787 (hc) : $c $22.00 ($27.00 Can.)
020    $a 0553354485 (tp)
035    $a (OCoLC)23768529
035    $9 ABX5076GM
040    $a DLC $c DLC $d SVP $d VGM
049    $a VGMM
050 00 $a PS3569.H392 $b B7 1991
082 00 $a 813/.54 $2 20
100 10 $a Zelazny, Roger.
245 10 $a Bring me the head of Prince Charming / $c Roger Zelazny and Robert Sheckley.
260    $a New York : $b Bantam Books, $c c1991.
300    $a 279 p. : $b ill. ; $c 22 cm.
700 10 $a Sheckley, Robert, $d 1928-

Record 3:
001 418377
008 900913s1980    nyua          000 1 eng  
010    $a    90194484 
019    $a 06513101
035    $a (OCoLC)22891470
035    $9 ACA2732GM
040    $a DLC $c DLC $d OCL $d VVR $d VGM
049    $a VGMM
050 00 $a PS3576.E43 $b C48 1980
082 00 $a 813/.54 $2 20
100 1  $a Zelazny, Roger.
245 10 $a Changeling / $c Roger Zelazny ; illustrated by Esteban Maroto.
260    $a [New York] : $b Ace Book, $c c1980.
300    $a 251 p. : $b ill. ; $c 23 cm.
650  0 $a Fantastic fiction, American.

etc. Aggregating MARC records into something useful is not that hard, although there are a few surprises along the way. Non-MARC records (SUTRS and such) are much messier, although they can also be very tempting since they mostly come from British and other foreign catalogs that sometimes have otherwise-hard-to-find records.

I'll see if I can post something semi-coherent on the topic on the Community Portal page (which needs to be archived badly) in the next few days. Ahasuerus 01:53, 25 Oct 2006 (CDT)

Verification Flag

I see that you have snuck in some Web code for the Verification Flag while nobody was looking :) However, I seem to be unable to change dates from 0000-00-00 to 8888-88-88 for "unpublished" publications any more. Could it be related to this code change? Ahasuerus 19:30, 29 Oct 2006 (CST)

Dissembler gaps, part 7

I wonder why Dissembler found only 4 out of 5 books for Alex_Archer? It missed The Chosen, a January 2007 release, but grabbed Forbidden City, a March 2007 release. Ahasuerus 23:27, 3 Nov 2006 (CST)

Dissembler typically does forthcoming books in subject mode, so if the sources haven't properly categorized a forthcoming book it won't see it. When run in author mode, it did find the book in question. I think it would be useful to run Dissembler in author mode to find these, but there are currently a lot of false positives in that mode, and I'd like to set up a staging database so that a specific ISBN can be permanently marked IGNORE, otherwise I'll have to reject the same books over and over again each month. Alvonruff 08:13, 4 Nov 2006 (CST)

Help links

Once the help screens are stable, I'll go through and make a list of what screens I'd suggest we link from what cgi scripts. Meanwhile just grab anything that looks useful -- I'm inclined to link people to the detailed help in most cases, rather than the "How to" or simplified pages. Mike Christie 08:51, 25 Nov 2006 (CST)

OK, here are the help links I think are worth adding to the edit pages. Each help file should have "Help:Screen:" prepended; I've done this for the first one only. I give the help file name first, then the CGI script name I think would be associated with it.

  • Help:Screen:AddPublication -- addpub
  • AddVariant -- addvariant
  • AuthorData -- editauth
  • ClonePub -- clonepub
  • EditPub -- editpub
  • EditTitle -- edittitle
  • MakeVariant -- mkvariant
  • Moderator -- list
  • RemoveTitles -- rmtitles
  • SeriesData -- editseries
  • Verify -- verify

Also, the newpub script could have NewPub and NewNovel linked to it, with the latter linked only for the Novel entry screen, if that's possible. If not, just use NewPub in all cases. Mike Christie 09:53, 26 Nov 2006 (CST)

TitleRemove

Thanks for nuking the bogus titles for me -- that was going to be a tedious job. Mike Christie 09:37, 26 Nov 2006 (CST)

Mo problem. I've been in that mode for a few days now, going through Astounding Science Fiction. Alvonruff 09:40, 26 Nov 2006 (CST)

Happy Birthday

I just noticed the list of birthdays on the ISFDB page. Happy birthday! Mike Christie 20:14, 26 Nov 2006 (CST)

Congratulations, you have made it to 50! Not that it means a whole lot in the 21st century (except that you are now officially authorized to make less than complimentary comments about young people who have everything handed to them these days), but it was a non-trivial accomplishment just a few centuries ago. Here is your birthday pie:
       _,..---..,_
    ,-"`    .'.    `"-,
   ((      '.'.'      ))
    `'-.,_   '   _,.-'`
      `\  `"""""`  /`
        `""-----""`
 :) Ahasuerus 21:13, 26 Nov 2006 (CST)
Thanks for the pie. Now get off my yard! Alvonruff 06:58, 27 Nov 2006 (CST)

Oct-Nov 53 Fantastic Universe update?

Al, I was just fixing the attributions on the early Fantastic Universes to conform to the rule being discussed on the Community Portal (i.e. "The Editor" instead of substituting Merwin's name), and I noticed that the title of Merwin's piece was given as "Editorial: The Aliens". I'm pretty sure I entered this as "The Aliens", but I had seen you working through Contento2, so I'm guessing that you made this change based on that source -- is that correct?

If so, I agree with Contento that it's an editorial, but I'd like to leave the title just as "The Aliens" -- that's what's given in the magazine, and I don't see a need to change it. Does he do that for other editorials? Mike Christie 19:59, 27 Nov 2006 (CST)

While going through Astounding, there were several years where Campbell wrote both an editorial and an article in the same issue, so I've been preceding the editorials to differentiate them. I don't have a strong preference, so if you prefer it without, then I'll move it back. Alvonruff 06:05, 28 Nov 2006 (CST)
I've moved it back; I think I'd like to stick to the "use what the publication shows" rule where possible. Thanks. Mike Christie 08:35, 28 Nov 2006 (CST)

Analog January/February 2007

Al, http://www.isfdb.org/cgi-bin/pl.cgi?ANLGJANFEB2007 lists the following two entries:

  1. 184 • Rollback (Part 4 of 4) • serial by Robert J. Sawyer
  2. 184 • Rollback (Part 4 of 4) • shortfiction by John Allemand

Is the second one a "shortfiction" or "interior artwork"? Ahasuerus 10:29, 8 Dec 2006 (CST)

Not that I ever make mistakes, it should be artwork. Fixed. Alvonruff 12:33, 8 Dec 2006 (CST)

More on help links

Al, I just noticed that the help links on "New <foo>" are still pointing to the old links -- can we update them? The best thing to link to would be Help:Screen:NewPub. Mike Christie 12:58, 8 Dec 2006 (CST)

Sorry - did the work, just didn't upload that particular app. Hopefully fixed now. Alvonruff 13:14, 8 Dec 2006 (CST)

Series deletion

Al, before I spam the Wiki with more feature requests/bug reports, let me ask you if the current lack of "empty series deletion" functionality is by design. As we continue to massage the data, we will have more and more empty "orphan" Series records with no Title records associated with them. Some of the time the ISFDB software handles empty series well and some of the time it doesn't. Would it be possible/desirable to change Title processing behavior so that when an update/deletion results in the last Title in a Series getting either moved to another Series or deleted outright, then the resulting empty series gets deleted as well? And if it is not desirable for some reason, then do we want to have a "Delete Series" option, which would only allow you to delete empty Series? Ahasuerus 17:28, 14 Dec 2006 (CST)

It's desireable, but nontrivial due to heirarchical series. For instance, there are multiple series that have no titles in them, but shouldn't be removed because they are parents to children series that aren't empty. In fact the first version did attempt to delete an "empty" series, but it kept removing needed series, and after adding hack after hack to the heuristics to determine an empty series, I finally pulled it for a later date. I'd say that we should just make this a bug rather than request a new feature - either method still requires a reliable heuristic to detect a series that's truely empty. Once we have that, we might as well clean up after ourselves when needed. Alvonruff 18:55, 14 Dec 2006 (CST)
Thanks, that explains a lot! I'll go and roll up all related bugs on the "Series bugs" page to reflect the fact that they will all go away once the "empty series" issue has been resolved. Ahasuerus 10:35, 15 Dec 2006 (CST)

10056

Al, I made some notes at ISFDB_talk:Beta#More_on_showstoppers about the showstoppers. The one in particular I wanted to ask about was 10056 -- are you OK with adding it to the list of showstoppers? I think it ought to be fixed, but didn't want to add it without checking with you. Mike Christie (talk) 07:44, 15 Dec 2006 (CST)

That look's fine. If you guys feel something should be moved to the showstopper list, go ahead. If I strongly disagree, I'll post a response. Alvonruff 08:45, 15 Dec 2006 (CST)
OK, done. I actually think only 10023 and 10056 need to be fixed for the beta, but that's partly because I can't reproduce a couple of the others on your list. Mike Christie (talk) 09:03, 15 Dec 2006 (CST)
I plan on knocking off a huge number of bugs over the next two weeks, so don't feel conservative. Alvonruff 09:43, 15 Dec 2006 (CST)

Data consistency scripts

Wow, that was fast! :-) Ahasuerus 11:31, 18 Dec 2006 (CST)

I already had a few of those, and the other was easy to put together. Some of the others will be more interesting. Alvonruff 13:58, 18 Dec 2006 (CST)

Possible dataloss bug

Al, just wanted to check this one with you before I posted it. I just addpubbed a pub to "The Tombs of Atuan", and when I updated the result to add an interior art record, I noticed that it changed the NOVEL title record from jvn to blank, though I had not consciously modified it. This is presumably an error in addpub's display. I clicked "Approve" before noticing that change, so it's been deleted. Are we using "jvn" as a length on a standard basis? I don't have anything in the help files for it; should I add something? Anyway, since it might be a dataloss bug, I thought I should let you know. Mike Christie (talk) 12:16, 18 Dec 2006 (CST)

The storylen field has been overloaded for numerous purposes. In this case jvn signified 'juvenile'. See ISFDB:Community_Portal#Using_the_Storylen_.2F_Length_field_to_note_series.2Fvolume_numbers for others. We're not currently displaying these annotations, so it might look like they appear out of the blue, but there are quite a few of these. Alvonruff 13:58, 18 Dec 2006 (CST)

Short Works issues

Al, do you intend to keep Short Works on the Long Works page or was it a side effect of another change? I don't see anything wrong with combining the two as long as the application can handle the extra load without affecting performance.

Also, do you think you could look into the problem with Variant Titles of Short Works appearing twice prior to the beta? It makes it hard to look for real duplicate titles in Short Works. TIA! Ahasuerus 17:47, 18 Dec 2006 (CST)

I'm just starting to play around with the display issues. When we converted to SQL, I was concerned about the application load, and split them into the short and long works, which turns out to be the root cause of many of the display issues. I'll keep the strictly Long and strict Short apps around for a while, but focus on the combined app. There are some other dups as well (like Serials previously printed under a previous novel listing).
Some people like to focus on long works and found the short works cluttered up the bibliography. I find being presented with "No long works available" annoying when working with authors who were/are predominately short fiction authors. Alvonruff 18:36, 18 Dec 2006 (CST)
I tend to like the unified view as well, but I can see how some people may not care about the short stuff. OTOH, I don't think bandwidth -- a consideration 10 years ago -- should be a problem now, not even for all 4 of our 56kpbs users in Grand Fenwick. Also, anything that creates multiple execution paths for the display logic is undesirable as it can lead to functional divergence, as we have already seen. Ahasuerus 17:52, 19 Dec 2006 (CST)

Remaining work?

Al, looks like the only thing left on ISFDB:Beta listed as a showstopper is the Unicode fix. From your summary line it doesn't sound like it's a quick fix. What's your opinion on what needs to be done to go live? I'm going to have a lot of free time between now and around Jan 7, so I'm inclined to say let's just do it, and stand by with mop and broom for any cleanup needed. (The Unicode fix really doesn't seem necessary to me.) This is probably a conversation we should have on the ISFDB Beta talk page, but I wanted to get your take on the remaining work -- no point in debating it if you've still got a week of other things to do before we can open up. Mike Christie (talk) 10:49, 19 Dec 2006 (CST)

I put the editing bugs on the must fix list, as dataloss seems like a severity 1 to me. The unicode stuff (like the Russian errors) are annoying, but don't prevent work from continuing. I think people need to pick out display errors that they think are showstoppers due to high annoyance, but I otherwise vote to turn it on. I'll be working on the features in the background.
I think the next step is for the current moderators to go through the bug lists, determine which ones make them uncomfortable about going to Beta, and put them on the Beta page. When there are no bugs left on the Beta page, we go live. Alvonruff 11:19, 19 Dec 2006 (CST)

Cold Print

I gather from UnaPersson that this edition is actually a collection. I saw you verified it a month or two ago, so I wanted to check that there's nothing odd going on before I correct it to COLLECTION. Was it just an error? Mike Christie (talk) 11:51, 22 Dec 2006 (CST)

Hah! The database had it as a novel and there's no indication on the cover that it's a collection, so I just validated the metadata. I'll make it a collection, and add the stories. Alvonruff 12:00, 22 Dec 2006 (CST)

Front page

When you get a chance, you might want to update the ISFDB front page to say "is now open" . . . . Mike Christie (talk) 12:33, 22 Dec 2006 (CST)

Done. Alvonruff 12:54, 22 Dec 2006 (CST)


Sorting Order for Series on an Author's Bibliography

Al - I was thinking of ways to organize omnibusses so that they don't clutter up the main list of novels too much and on Marion Zimmer Bradley's page created a new series called Darkover Omnibus. That seems to work well enough but the puzzle is that it did not sort immediately after Darkover. The series list looks like it's in alphabetical order other than Colin McLaren and I'm wondering why that's the case. Marc Kupper 15:30, 22 Dec 2006 (CST)

The SQL query doesn't perform a sort at present. What's your preferred sort? Alvonruff 15:35, 22 Dec 2006 (CST)
I’d suspect alphabetically by series name is something most people would understand. Fantastic Fiction seems to sort by the date of the earliest title meaning series get listed in roughly the order the author started them. Of course, someone will come along and say author X is well known for series Y and so that series should be at the top of the page as most people visiting the page will go looking for it meaning ISFDB could turn into a WYSIWYG wiki like thing. Marc Kupper 17:10, 23 Dec 2006 (CST)
Alphabetical sortng sounds reasonable. Some authors have written/contributed to so many series (e.g. Andre Norton) that trying to figure out which series started when would be a headache. Ahasuerus 17:15, 23 Dec 2006 (CST)

Help for the pseudonym editor

I see it's done; cool. Can you give me an outline of what it does it terms of records? I want to make sure I write the help accurately. I assume if you say A has alias B then A becomes the canonical, and all the B records are created as vt's of any corresponding titles -- is that correct? What does it do if the vts are already there -- e.g. "Ursula Le Guin" which is an existing author record which (as far as I know) has all its records marked as vts already? Mike Christie (talk) 16:14, 22 Dec 2006 (CST)

It does not modify title records (and must not) - it simply maps the pseudonym author to the parent author. Imagine what would happen if we had no pseudonyms in the database, and pointed "Alexander Blade" to "Don Wilcox" - and it changed all the records. Then we added "Edmond Hamilton", and then all the records were modified again, blowing away the "Don Wilcox" linkage, and then again for "Robert Silverberg". Pseudonym mapping is specific to each title, so we need to track it there.
Most of the pseudonym magic is already happening in the variant title support, so the actual effect of the editor is fairly underwhelming now. It will cause a note to appear on the pseudonym's bibliography labeled as "Used As Alternate Name By:", followed by a list of all the parent authors that used that pseudonym. In other words, it forms a link that allows the user to navigate from "Ursula Le Guin" to "Ursula K. Le Guin" without doing an additional search. We can also add the link "Wrote under the Following Names" to the author bibliography. We can also create a pseudonym browsing tool if desired.
The tool specifically modifies the pseudonym table, creating a mapping between the pseudonym author and the parent author. This has an effect on the summary bibliography, in terms of displaying extra links, and in the structure of the bibliography itself. (For instance, series information is currently supressed on the pseudonym's bibliography). Alvonruff 17:38, 22 Dec 2006 (CST)
OK, so adding a vt does not currently create a pseudonym record, then? This new tool is the only way pseudonym records can be created? The other question would be how they get deleted -- if either author disappears because all their works disappear, the pseudonym record presumably goes too. Is that right? Finally, since the record just stores pointers, it doesn't matter if you edit the author records in any way -- the pseudonym remains as it is.
I like the idea of a pseudonym browsing tool. The "Who's Hugh" format is pretty good; maybe we can do something like that. I'll put in a feature request and I would think this is a case where the feature definition page will be handy to define requirements. Mike Christie (talk) 08:48, 23 Dec 2006 (CST)
Hi Al – regarding “Make This Author a Pseudonym” – could you please explain the mapping process? The background questions are:
  • Is it safe to do this twice on an author? For example, I looked at Brian Stableford vs. Brian M. Stableford and spotted one title in Brian M. Stableford’s list that did not have a vt relationship to the same title for Brian Stableford. I added that one by hand rather than risking “Make This Author a Pseudonym.”
  • Does the mapping create title records for the parent if they don’ already exist?
  • When comparing parent and pseudonym title records do you look at anything other than the title?
  • Is there a record somewhere of this mapping so that as people add new titles to the pseudonym that title records will automatically get added to the parent author?
  • Is there any form of linkage between the parent and pseudonym title records so that if one gets modified that the other is updated?
  • Can “Make This Author a Pseudonym” be used to remove a pseudonymous relationship? If so, does it just remove the vt relationships or do you also delete the parent title records? --Marc Kupper 15:21, 27 Dec 2006 (CST)

Uptime issues and the limit on the number of Tiltes per Pub

  • Our uptime has been less than sterling lately - take a look at the operations page. Would it be possible to query the TAMU IS staff and see if they have been feeding the hamsters?
  • We can check.
  • How many Titles do we allow per Publication? Books like Science Fiction & Fantasy Book Review Annual 1989 can contain up to 1,000 sub-page reviews. Ahasuerus 00:51, 23 Dec 2006 (CST)
  • Theoretically, there's no limit on titles per pub. We have a limit on authors per title, but that's an application limit, not a database limitation. From a practical point of view I wouldn't do more than 30 to 50 titles at a time, just to limit work loss if some interesting error happens (much like when I entered The New Space Opera titles yesterday). Alvonruff 06:03, 23 Dec 2006 (CST)
  • Ah, good to know! And yes, Wikipedia has been known to eat particularly elaborate updates as well -- unless your browser had crashed first, of course.
  • Let me just make sure that I understand: we are pretty sure that there is nothing in the form processing logic that would result in a minor thermonuclear explosion once the Content Title count reaches, say, 1,000, right? Re-entering 999 Titles would be somewhat time consuming. Ahasuerus 11:01, 23 Dec 2006 (CST)
The forms should be able to work to a maximum postive integer value, which would be around 2 billion entries - so that's your theoretical upper limit. I just tried editing '100 Ghastly Little Ghost Stories', and was able to add 10 more stories, so we know it can do 110. There's no reason why it shouldn't be able to do 1100 (although the processing takes a bit of time). Alvonruff 14:10, 23 Dec 2006 (CST)
Sounds good! Ahasuerus 14:17, 23 Dec 2006 (CST)

Search page notice?

What do you think about adding the beta recruitment notice to the top of the search page results? I'm thinking that that's a page most users will see. Mike Christie (talk) 08:41, 23 Dec 2006 (CST)

Two new display bugs -- high(ish) priority?

Al, could you please take a look at 20062 and 20064 when you get a chance and see if they could be addressed quickly? They were just recently (yesterday?) introduced and they have confused 2 editors so far. Granted, they are "display only" bugs, but I am having a hard time explaining the "display only" concept to editors :( Ahasuerus 16:02, 25 Dec 2006 (CST)

20062 is related to the long form bibliography, which used a different SQL query that hadn't been updated. That appears to be working now; let me know if you see any other errors. 20064 - I think I've seen that, but can't reproduce (possibly fixed by 20062). If you find a page that demonstrates the problem (the current Dinosaur Park example is gone), let me know. Alvonruff 19:51, 25 Dec 2006 (CST)
Looks good on both fronts so far! :) Other than that, I have just added 20065 and also Serials are not being matched correctly under some circumstances that I am still investigating, but no showstoppers at the moment. Ahasuerus 20:03, 25 Dec 2006 (CST)
There was one missing clause in the new SQL query that caused reviews of an author's books to show up under the Reviews section (which should only show reviews performed by the author). Ditto for interviews. Fixed now; mentioned in case you saw some spurious examples. Alvonruff 20:13, 25 Dec 2006 (CST)
Oh, I see! I thought you had snuck in some new and exciting functionality :) It looks fine now, I'll keep trying to puzzle out the Serial display thingie. BTW, I seem to have run into a Kornbluth/Merril novelette (in one of the Dynamics) that has never (!) been reprinted. Could it possibly be either that bad or that overlooked? Ahasuerus 20:18, 25 Dec 2006 (CST)
I'm pretty sure we have all of the anthologies and collections indexed by Contento in his pre-1984 index, which would cover the most likely reprint years. Perhaps it's been reprinted since? Hasn't NESFA done a complete short works of Kornbluth yet? Did Merril do bad work? Alvonruff 20:24, 25 Dec 2006 (CST)
I have checked Contento's Index (now online and last updated in 2005) as well as Locus for 1984-1998. The NESFA collection that you are thinking of was indeed published in 1997, but it only collected Kornbluth's solo short fiction. That's why the title was His Share of Glory :)
Merril's solo work was somewhat spotty and not all of it has been reprinted. She did pretty well as an editor, though. Kornbluth/Merril collaborations were not among Kornbluth's best stories, but still, no reprints at all? Merril's website lists the original appearance of the story, but nothning else. It's not like Dynamic was particularly obscure, so I am a bit perplexed and will give the novelette a try tonight to see what's up. Ahasuerus 20:47, 25 Dec 2006 (CST)

Shortfiction series display messed up?

Al, are you changing the short fiction series display logic to display Long and Short Works together if they belong to the same series? AFAICT, if there are no Long Works to display, Shortfiction series titles are now displayed after all other Shortfiction Titles and without any indication that they belong to a series. For example, Noel Loomis has two shortfiction series consisting of 2 and 4 Titles respectively, which now show up at the bottom. On the other hand, T. Jackson King's The Forty-Seventh Florescence is displayed correctly since (?) it has a Long Work to anchor it. Or so it looks from this side of the fence. I will file a bug report shortly. Ahasuerus 21:06, 25 Dec 2006 (CST)

Corrupt Silver Web record

Al, could you please fix this Silver Web issue when you get a chance? It's missing Tag data and is therefore not linked properly from a related Title, e.g. try accessing it from this title. Also, could we try to address the bug with user-submitted Tags soonish so that we don't have this kind of data corruption spreading? Thanks! Ahasuerus 15:09, 26 Dec 2006 (CST)

This was an easy fix - just edit the pub (which works off the record number) and add the tag. The title launches by tag, so once the pub had one it was fine. Does this only occur when submitting a magazine? Alvonruff 15:35, 26 Dec 2006 (CST)

Editpub crash

Al, I take it you're aware that editpub is crashing with a python error? I assume you're in the middle of some edits to the code, but just wanted to make sure you knew about it. Mike Christie (talk) 16:16, 26 Dec 2006 (CST)

Give me an example - it's working for me. Alvonruff 16:18, 26 Dec 2006 (CST)
I found an example for when there's no isbn data (obviously related to isbn changes today). That one's fixed. Alvonruff 16:22, 26 Dec 2006 (CST)
That seems to have fixed it -- thanks. I was working on magazines, so of course I saw that problem early. Thanks for the Hold function too, by the way -- fast work!! I've held one of Pagadan's and will wait for consensus on the community portal before I reject it. Mike Christie (talk) 19:55, 26 Dec 2006 (CST)
Yes, that Hold button is very nice, thanks! Ahasuerus 20:03, 26 Dec 2006 (CST)

Ditching

I thought you'd like to know that your "ditch the list" addendum made me laugh out loud. Thanks for the "difficulty" notes, too. Mike Christie (talk) 09:16, 27 Dec 2006 (CST)

10069

Sorry about re-reporting 10069! It was late and it escaped me that we had this very same discussion about case sensitivity wrt author names a few months ago. I'll go and engage in vigorous self-flagellation with a wet noodle now... Ahasuerus 13:14, 27 Dec 2006 (CST)

No problem. It does mean that other people will try to do the same, but I think we're doing the right thing. And of course, no one will read the documentation (if it even mentions this situation), so we'll just have to deal with it. Alvonruff 17:25, 27 Dec 2006 (CST)
Well... I have been working with Hayford Peirce (the Analog author), trying to get him up to speed wrt using the ISFDB. He is not finding our user interface particularly intuitive, I am afraid. Here is my explanation of how to Merge two of his titles and