ISFDB:Community Portal/Archive/Archive01

From ISFDB

Jump to: navigation, search

This is an archive page for the Community Portal. Please do not edit the contents. To start a new discussion, please click here.
This archive includes discussions from

Archive Quick Links
Archives of old discussions from the Community Portal.


1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · 9 · 10 · 11 · 12 · 13 · 14 · 15 · 16 · 17 · 18 · 19 · 20 · 21 · 22 · 23 · 24 · 25 · 26 · 27 · 28 · 29 · 30 · 31 · 32 · 33 · 34 · 35 · 36 · 37 · 38 · 39 · 40 · 41 · 42



Contents

Griping

I just established an account (username Tillman), thinking I'd edit my own author entry. However, logging in at http://www.isfdb.org/cgi-bin/index.cgi with the new account just results in a loop: "you are logged in, continue" <click> "Login required to edit" [continue ad infinitum] ??? Pete Tillman, 6/16/06

TAMU is currently changing their DNS servers, and I was required to change out the DNS servers yesterday related to isfdb.org. Since 1) the login code hasn't changed in a few weeks, and 2) I was required to relogin to the wiki, I'm presuming that this is a short term hickup related to the DNS change. Alvonruff 04:49, 17 Jun 2006 (CDT)

Hmmm; I came over to ISFDB to verify the date of a minor short story publication of Robert Forward's, and see that everything has been changed.

Very much so. ISFDB is in the process of moving from ISFDB1 to ISFDB2 and almost everything is being changed under the hood. Ahasuerus 10:25, 14 Jun 2006 (CDT)

As a general thing, I like the Wiki format, but whoever implemented this version of a wiki seems to have done it in a very confusing and hard to use way. There's a help page, but it seems to be not only blank, but protected, presumably so that nobody can change the very very important information on the page to something that might be inaccurate.

At one point the ISFDB Wiki had a fairly major problem with automatically registered link (gambling, porn, etc) spambots who filled up Wiki pages with junk. Everything has been cleaned up now and protections are being gradually lifted as we lurch^H^H^H^H move forward. Ahasuerus 10:25, 14 Jun 2006 (CDT)

Help:Contents I tried to delete an incorrect entry in Robert L. Forward's bibliography [_THE OWL_ was written by Robert Forward, not Robert L. Forward-- the fact that his son wrote mysteries under the name Robert Forward is in fact the very reason that Bob Forward always included his middle initial "L" in his byline.]. However, deleting a title, or making any changes to the information, yields a page saying that this can only be done by a moderator, with no information as to who might be a moderator, or how to contact them.

Submissions are likely to be re-enabled some time this summer once the beast is reasonably stable. [crosses fingers] Ahasuerus 10:25, 14 Jun 2006 (CDT)

I'd try searching "moderator" in help, but of course help is blank. There is no contact information whatsoever anywhere obvious on the page.

The help system is rather rudimentary at the moment and will need to be beefed up prior to enabling submissions -- see the discussion at the bottom of the page. Ahasuerus 10:25, 14 Jun 2006 (CDT)

It is probably hidden somewhere. Sign-ins seem erratic. I've gotten the response "you need to sign in" several times. This could be a server cache problem with a failure to refresh, but it sure seems to come up a lot (and on both Safari and Firefox), and looks like an error in the implementation of cookies.

A few sing-on/cookies bugs have been zapped over the last month, but it sounds like some may have survived. Ahasuerus 10:25, 14 Jun 2006 (CDT)
The ISFDB database and the wiki are in different domains, so you need to log into each one separately (browsers send cookies on a domain-name basis). If "several" means "2", then there is no bug; if "several" means "greater than 2", then there is a bug. Alvonruff 17:14, 14 Jun 2006 (CDT)

A number of other oddities and strangenesses. The navigation is no longer straight forward. I presume that this must be a work in progress? [Added by Geoffrey Landis on June 14, 2006]

Well, the user interface is as new as everything else in the system. It is being tweaked as we speak, so it's quite likely that some of the "oddities and strangenesses" will go away shortly. On the other hand, some of the "oddities" are likely consciously implemented features rather than bugs and their future will depend on how viable they will prove to be once submissions are enabled. Ahasuerus 10:25, 14 Jun 2006 (CDT)
A couple of additional notes to Ahasuerus's comments, since it took me a few minutes after I arrived here to understand the division of functionality properly. It may not be immediately clear that the Wiki and the ISFDB will be parallel; the ISFDB contains a database and will soon contain a suite of edit tools; the associated Wiki captures discussions about the content of the ISFDB. The goal is that anyone can edit data, and a smaller group of moderators can accept or reject the edits; this step is necessary to avoid wholesale vandalism to the underlying database. The Wiki is a supporting tool. We haven't finished working out the details yet, but we expect that there will be pages on the Wiki that document the verification work that has been done to date for authors, or for particular publications. This could take the form of a page listing all the cross-verifications that have been done against various bibliographic references such as Nicholls, Tuck, Currey and so on; or where possible against the actual physical publication itself. The timeline for going live is not yet set but the set of tasks to be achieved first is diminishing, and we've just started a discussion at the bottom of this page about the remaining work to be done -- please join in if interested. Mike Christie 10:35, 14 Jun 2006 (CDT)
Any further details on navigation issues? With the addition of the navigation bar, we're presuming that navigation is actually easier. The only major change from the old ISFDB is the splitting of the long works from the short works. Casual readers don't want to see the short work bibliography, while authors, academics, and bibliographers do. Alvonruff 17:14, 14 Jun 2006 (CDT)

Wikipedia

There are questions as to the ISFDB's eligibility for a separate Wikipedia article due to possible lack of notability -- see here. I posted a quick response a few minutes ago, but I am sure Al has more data bits on hand :) Ahasuerus 12:43, 16 May 2006 (CDT)

Budrys minor change

I just changed the price on the Gold Medal first edition of Rogue Moon to 45 cents (from 35 cents). I have a copy, and that's what it says on it, so I went ahead and made the change, and then figured out how to moderate it. That raised a couple of questions:

Is there a way to see the history of changes on e.g. a publication like this? I.e. is there a page where I can see what's been edited on a given publication? Mike Christie 23:47, 22 May 2006 (CDT)

Al has a way of reviewing the log of recent XML submissions, but I don't know how to access it yet. I am trying to get a better grip on the tools used to create and run the ISFDB software -- after all, how difficult can a scripting language and a relational database can be, right? :) -- but I haven't had much time to spend on it as of yet. Other things keep interfering. Ahasuerus 09:38, 23 May 2006 (CDT)

I saw you had created an "Author:C. S. Lewis" page for biblio notes on Lewis. Mike Christie 23:47, 22 May 2006 (CDT)

Actually, C. S. Lewis is Al's current mini-project that I plan to add to from Reginald-3 and a few other places when I get a chance. Ahasuerus 09:38, 23 May 2006 (CDT)

Should I create one for Budrys and record my changes? Mike Christie 23:47, 22 May 2006 (CDT)

Yes, please. We probably want to find out where the $.35 price comes from (Tuck?) and document it there as well. Sometimes a subsequent printing of the first edition may look exactly like the firt printing except for the price, so prices can be fairly important. Besides, we are still experimenting with different formats and ways of capturing bibliographic data (see, e.g. Author:Richard Cowper for a recent attempt to come up with a Verification matrix for Authors), so the more information we enter in these supporting structure, the better idea we will have of what kinds of data we need enhanced support for. Ahasuerus 09:38, 23 May 2006 (CDT)

Is there a way yet to create a new magazine? E.g. I have quite a few copies of "The Original Science Fiction Stories", which isn't in the DB. Is there a way to enter them? Mike Christie 23:47, 22 May 2006 (CDT)

According to the ISFDB Feature List, "05/28/2006 - Editing Tool: New Content. This tool will allow users to submit new magazines, anthologies, etc...", so soon, very soon :) Ahasuerus 09:38, 23 May 2006 (CDT)

Adding a publication to a title

I just added another publication of "The Falling Torch" to Algis_Budrys, and the result is two titles under the author. Can someone tell me what I've done wrong? Mike Christie 12:04, 23 May 2006 (CDT)

How did you add the Publication? Did you do it by pulling up the Work title and then selecting [Add Publication to This Title]? If so, then it's a bug. But if you simply used the "New Book" option in the navbar, then the software created a new Work/Title since it had no way of telling that the new Publication was related to an existing Work. You can easily merge the two Titles/Works by selecting [Titles] in Budry' bibliography, though. It will combined all Publication records from the two Titles under the merged Title record. Ahasuerus 12:13, 23 May 2006 (CDT)
I brought up the Title and selected [Add Publication to This Title].
However, I now see the situation is much more complicated. This is probably a good place for me to learn how to deal with the data, so I will avoid making any significant edits anywhere else till I've figured this out. I'm afraid I may have already damaged some data, but I can learn by fixing it.
Without reference to the ISFDB data, here's what appears to be the publication history:
Short story "Falling Torch" in Venture, Jan 58.
Fixup, "The Falling Torch", June 1959 from Pyramid
Various subsequent reprints.
In the ISFDB, I believe one of the novels was listed as a collection. I've just realized that "SHORTFICTION" is used for stories and novellettes and novellas; I knew TFT was a fixup so I may have seen "SHORTFICTION" and interpreted it as "COLLECTION" since I'm not yet used to the ISFDB naming. In either case I changed a record to show "NOVEL". That led to the two records in the bibliography.
The records in the ISFDB now appear to me to be:
  • Title ref 705: "The Falling Torch", 1959, NOVEL. Two publications attached: FLLNGT1959 and THFLLNGTRC1964.
  • Title ref 39665: "The Falling Torch", 1959, NOVEL. One publication attached: THFLLNGTRCH091959.
  • Title ref 13456: "Falling Torch", 1991, NOVEL. One publication attached: BKTG22159.
  • Title ref 96733: "Falling Torch", 1958, SHORTFICTION. One publication attached: THFLLNGTRCH091959 (same as 2nd line above).
The publications are:
  • FLLNGT1959 (39562). First edition. NOVEL.
  • THFLLNGTRC1964 (87892). Third edition. NOVEL.
  • THFLLNGTRCH091959 (39563). First edition. NOVEL. (Duplicate of FLLNGT1959.)
  • BKTG22159 (13112). 1991 edition. NOVEL.
Venture SF is not yet entered, so that doesn't show up.
The correct situation, I suspect, should look like this:
  • Title ref XXX: "The Falling Torch", 1959, NOVEL. Three publications attached: 1959, 1964 and 1991.
  • Title ref YYY: "Falling Torch", 1958, SHORTFICTION.
But then should the short story show up in the novels? A lot of the text is no doubt there; perhaps verbatim.
Also, can I ask why I can mark a publication as NOVEL, when it's attached to a title that will also have that attribute? Presumably this is essentially an override; what's the intended scenario for its use?
If the above is correct I'll have a go at making it happen. Please tell me if I'm still wrong! Mike Christie 12:52, 23 May 2006 (CDT)
I have double checked the entries in the ISFDB and everything appears to match what you are describing. Short stories that were later expanded into novels/fixups are generally supposed to be entered as separate Titles/Works since the text is usually quite different. There is nothing in the software to support this kind of "story->novel" association (that I know of), so for now I just add "Notes" to both Works to indicate that they are related. Same thing with fixups that were cobbled together from multiple stories.
The easiest way to address the immediate problem is to merge Titles 705, 39665 and 13456 using the Merge tool in the navbar. You can then pull up the resulting Title record and delete any duplicate publications that will all be nicely displayed together for your viewing pleasure. You may want to make sure that the Award pointer is preserved in the combined Title record since Awards associations have been known to disappear during Merges -- see the bug list. Also, I always check each Publication before I delete it since some main contain otherwise unavailable Contents data that we wouldn't want to lose. (Contents editing for Anthologies and Collections is supposed to go live at the end of the month).
As far as the use of "NOVEL" in Publications and Titles goes, it can get rather involved. In most cases, a NOVEL Title will point to a NOVEL Publication and that will be that. However, a Publication may also be an OMNIBUS which contains multiple NOVELS. Or the Publication type may be NOVEL, but it may contain not just the NOVEL Title that it will point to, but also one or more ESSAYs (Preface, Foreword, Introduction, Afterword, etc) either by the same Author or by different Authors. Ditto with Collections. Because of these gotchas, it's best to think of Publications and Works as a Many-to-Many relationship. A Work may appear in many Publications (reprints, omnibuses, etc) and a Publication may contain many Works. HTH :) Ahasuerus 14:19, 23 May 2006 (CDT)
OK, I merged two of them, but "Falling Torch" and "The Falling Torch" are on different pages. Is there a way around that? Mike Christie 18:21, 23 May 2006 (CDT)
I am unaware of any way to merge two records that are displayed on two separate pages when using the regular Title Merge option. As you have discovered, it can be a problem when the Author in question has more than 100 Titles to his or her name. (And yes, I have already requested that the Title Merge algorithm ignore "The", "An" and "A" for sorting purposes). The current workaround is to go under Advanced Search and use the first one of the three form located on that page. It lets you do a much more precise search on Title/Work records and can be quite powerful. If you search on a substring contained within the title that yields <100 results, e.g. "Falling Torch", you will get a compact set of results that you can then merge. Just watch out for eponymous Shortfiction entries :) Ahasuerus 20:12, 23 May 2006 (CDT)


Adding variant titles

I have a 1958 Avon edition of Blish's Jack of Eagles, with the variant title "ESPer". I went to Jack of Eagles and created a new publication, and edited the title field of that pub to be "ESPer". It now shows up correctly. Do I also need to run "Add a Variant Title or Pseudonymous Work to this Title" to add ESPer as a vt of "Jack of Eagles", or am I done now? Mike Christie 12:45, 24 May 2006 (CDT)

House names

I have a 1960 pb of John E. Muller's "Space Void", from Badger Books. John E. Muller was a house name, used by A.A. Glyn, R.L. Fanthorpe, and John L. Glasby. There are three further books of unknown authorship, one of which is "Space Void". Do I need to do anything special with this as I enter it? Or should I just go to "New Book" and enter it afresh? Mike Christie 19:56, 24 May 2006 (CDT)

House names are tricky since half the time we are not 100% sure who wrote what. Sometimes new information comes to light decades after the first publication.
As a general observation, the ISFDB team is trying to do two things at the same time:
First, we are trying to catalog "objective" or "publication" data. This data should be readily verifiable for any mass produced publication, i.e. a book, magazine, audio tape, etc. "Objective" information includes the author's name as printed on the cover, the name of the artist, the Publication's title as printed on the cover, ISBN, catalog number, cover price, list of texts included in the collection/anthology, and other data elements that anybody can see. This is what most libraries record in their catalogs.
Second, we are trying to use this captured "objective" data to derive "subjective" or "Work level" data. This data can include the canonical name of the real author, the canonical names of the texts (stories, novels, essays, etc) included in the publication, series information about these texts, etc. This is a much more ambitious and potentially time consuming project (which is why many libraries shy away from it) because sometimes you have to do a fair amount of research re: pseudonyms, series, etc, but the derived information can be very useful to our users. Without it, you would have one list of texts for Kuttner, another for Padgett, yet another for O'Donnell, etc, and that would be bad. Thankfully, there are pre-existing genre encyclopedias and bibliographies, which help in his area, but they are not perfect either.
To get back to your question, we would ideally capture the "objective" Author data (in this case "John E. Muller") from the book's cover in the Publication record and would then provide the real author's canonical name in the Work record that points back to the Publication record. Since we don't know who the real author was in this case, we probably want to leave the house name in the Author field of the Work record (which will auto-generate a new Author record if thre isn't one for this name) and have a note for this "Author" explaining that it is a house name. Since we don't have an Author-level Notes field yet, we will probably want to put it in the Work-level Notes field. Later on, if and when the pseudonym has been disclosed, we would change the Work-level "subjective" Author name.
However, that's just the current theory of the thing. The reality is that pseudonym support is still very much in a state of flux, as Al indicated in the Open Features section earlier today. HTH :) Ahasuerus 15:29, 25 May 2006 (CDT)


Verification discussion

I remember seeing a discussion about doing database verification in hardcoded database format, versus the use of a wiki page. I strongly support, for what it's worth, the method of using a database. It's more useful, it's much easier to keep well-formed and standardized, and it's possible to provide a metric of, say, "25% of publications have been verified against the physical object itself".

Would it be possible to create a list of verification sources in its own table, e.g., this reference work, this edition, and then cross it with publications, so we could say that this user verified this publication against this reference work at this time. I think that the advantages of this over the proposed wiki system are significant. grendel|khan 07:19, 27 May 2006 (CDT)

Oh yes, we discussed various pros and cons at Bibliographic Rules#Data Verification Matrix - Layout a few weeks ago. However, one thing that I think we have learned from the last couple of weeks of intermittent data cleanup is just how convoluted sone of these issues are and how hard it can be to fit it into a series of checkboxes as originally proposed. I still think there is value to having "confidence level" information in the database, but as long as much of the verification data is effectively free text, there is no way around using the Wiki or the Notes field of the ISFDB record as well as flags.
Keep in mind that most genre encyclopedias are based on much more extensive lists and/or notes that are kept by the compiler/editor. For example, John Clute has his own "master list" that he used during his work on his two encyclopedias. The same applies to James Monaco and BASELINE. In our case, the Wiki/Notes serve the same purposes and I doubt we will be able to escape the need for them. Ahasuerus 23:32, 27 May 2006 (CDT)

My database skills are a mite rusty; I can try to whip something up in the form of a database structure to hold this, if there's not already one proposed and laid down. grendel|khan 07:19, 27 May 2006 (CDT)

One doesn't exist as yet. If you want to propose one, go for it. I'm going to try to focus on the content editors for the next couple of days, and whittle down the bug list. Alvonruff 07:58, 27 May 2006 (CDT)
I agree with Grendelkhan's comments about database flags being far more useful for verification purposes than a wiki format. His points about standardization and metrics are absolutely right.
However, there are two or three reasons I've included a wiki page element in the guideline I've proposed over at Stabilizing Bibliographic Data. Here they are; if I'm wrong about these reasons, then I agree that more flags is probably the answer.
  1. Workload vs. likely participation. Verifying one publication against Tuck, Reginald and Currey is quite time-consuming, and the result is just one verified record. Further verifications will typically do nothing but add another checked flag to a record that was already correct. Most ISFDB editors will not have these bibliographic references. For the few who do, is it the best use of their volunteer effort to use their time in this way?
  2. Goal of the ISFDB. Al can correct me if I'm wrong here, but I'd have thought it's more important to have correct data than to be a repository of bibliographic discrepancies. I admit that that would be nice, but we need a process that leads to correctness first. I also think a future project, a la stub-sorting over at Wikipedia, might take the verification notes on the publication pages and work up verification information in a unified location. In any case, I believe that is a lower priority goal and shouldn't drive activity now, while we still have plenty of work to do.
  3. Simplicity. I tried to make the approach I outlined as simple as possible, and require minimal code changes. I think it's easy to overdesign something like this. If we put too much process around this, it's going to fail for reasons of complexity. I suspect (and this is partly a gut feeling) that multiple bibliographic flags is overkill.
  4. Centrality of publication data. I think the publication is the number one source for us, and we actually want to discourage people from treating other bibliographic sources as "primary". Our focus should be the books and magazines; those other sources are tools for us, not the object of our work. We want to treat them that way. Mike Christie 11:40, 27 May 2006 (CDT)
Publication data is certainly central to the project, no question about it. Melanie Rawn's The Diviner Key is a perfect example -- we have 3 (sic) ISBNs for this title in the ISFDB, yet this frequently announced book has not been published. Clearly, these 3 ISBNs came from other bibliographic sources, which relied on publisger announcements, ISBN data submissions, etc.
Having said that, publications alone, even carefully examined, do not guarantee that you will capture all the bibliographic data that you need, much less that it will be 100% accurate. Some publications have the copyright date, but not the publication date, printed on the copyright page and the two can be months apart. Some publications will be marked as first editions even though they may not be true first editions, etc. It can be rather messy out there, and you often need to check both the publication in question and 1+ reference books to come up with a semblance of a complete picture.
The need for non-publication bibliographic sources is particularly noticeable when dealing with "subjective" data, e.g. preferred titles or series information. Your average publisher is usually not very good at indicating which subseries of which series his latest offerings belong to and other bibliographic sources often prove invaluable. Ahasuerus 23:32, 27 May 2006 (CDT)
I agree with your comments, and one could certainly extend the list of examples of publications that don't supply sufficient data. For example, many Ace books have a copyright date but no printing date, and if the book is a reprint the copyright date is no use at all in determining the date of publication; we had a recent discussion of an Ace issue of Star Beast that illustrated that. In a situation like that I've relied on both bibliographic sources and contextual data to set dates -- for example, since I can see that every D series Ace book from D-138 to D-197 is copyright 1956, I'd feel comfortable assigning a print date of 1956 to D-157, without any other sources. If I were to then set the correctness flag on that (referring now to the discussion at Talk:Stabilizing Bibliographic Data) then I'd also make a note on the Wiki page that that was the source of the data.
What do you think of my comments on cross-checking with multiple other biblio sources being too time-consuming to be a good first process? You appear to have done a fair amount of this so far; what's your feeling on the time budget for it? Mike Christie 10:23, 28 May 2006 (CDT)
That's right, I picked a reasonably representative Author, Richard_Cowper, a few weeks ago and tried reconciling his bibliography with every bibliographic source that we know of. One of the Lessons Learned was that "A comprehensive review of all biblio data for a Work is even more time consuming than I suspected". Cowper has only 24 Long Works to his name, 2 pseudonyms and 20 short fiction pieces and yet a review of his Long Works and pseudonyms alone took about 5-6 manhours.
Now, one of these days I will finally escape from the bowels of the Third Pentagon and retire, at which point I will have more time to spend on this project -- and besides, we will hopefully have more dedicated bibliographophiles on board in the foreseeable future -- but for now we will presumably need to prioritize our limited resources. I'll start another section on prioritization shortly :) Ahasuerus 12:22, 28 May 2006 (CDT)
I think prioritization is indeed a key discussion. My notes on stabilization are implicitly assuming that the top priority is to manage the change in data stability that will occur when user submission really starts. A way to identify what data is stable and correct, and what is not stable and may not be correct, seems to me to be the key to success (whether or not you agree with the guideline I proposed). It is pointless to clean up inconsistencies if it's not easy to see that that's been done, after all.Mike Christie 12:55, 28 May 2006 (CDT)

Database proposals

A caveat: I don't, for instance, know the rationale behind using MyISAM table types. I don't know why foreign key contraints aren't used, for instance, but I'm following that convention.

Here's one proposal. Verification comes from sources. I suppose sources should only be added by moderators or administrators. Sources would include particular editions of a reference work, and "the physical book itself".

CREATE TABLE source (
  source_id          int(11) NOT NULL auto_increment,
  source_name        mediumtext,
  PRIMARY KEY        (source_id)
)

The heart of the verification process: A user marks a publication as verified by a particular source. This adds a record to this table.

CREATE TABLE verification (
  verification_id    int(11) NOT NULL auto_increment,
  pub_id             int(11),
  source_id          int(11),
  user_id            int(11),
  PRIMARY KEY        (verification_id),
  KEY pub_id         (pub_id),
  KEY source_id      (source_id),
  KEY user_id        (user_id)
)

Here's another proposal: link to other publications as reference sources, and enter the reference sources into the database as REFERENCE works; verifications against the physical object would point back to the publication itself. (So if publication n was verified by inspecting it, both source_id and pub_id would be set to n.) On the plus side, this requires only one new table, as we don't have separate tables for verification sources and verified publications.

CREATE TABLE verification (
  verification_id      int(11) NOT NULL auto_increment,
  pub_source_id        int(11),
  pub_verified_id      int(11),
  user_id              int(11),
  PRIMARY KEY          (verification_id),
  KEY pub_source_id    (pub_source_id),
  KEY pub_verified_id  (pub_verified_id),
  KEY user_id          (user_id)
)

It would be possible to add a flag to the pubs table pub_reference to mark a work as a reference, but I think this is the way to do it which involves the least changes. (My SQL is rusty; I think this is the way to update an ENUM field, but I'm not absolutely certain.) It might be sensible to restrict marking a publication as a reference, as it only makes sense for a tiny proportion to act as references for other publications.

ALTER TABLE pubs CHANGE pub_ctype pub_ctype
            enum('ANTHOLOGY','CHAPTERBOOK','COLLECTION','MAGAZINE',
                 'NONFICTION','NOVEL','OMNIBUS','REFERENCE') default NULL,

Then, a publication with pub_id x can be used as a reference for verifying publication with pub_id y if (a) x == y, or (b) pub_ctype == 'REFERENCE' where pub_id == x.

Unresolved questions: If a record is updated, is it still verified? Should we timestamp verifications and note that they need to be updated? If we do that, should we start timestamping titles as well? grendel|khan 09:26, 29 May 2006 (CDT)

Just thought of this; I like the second proposal much more than the first, but how can we mark a publication as verified against Amazon.com's data, for instance? Do we add Amazon.com as a reference work? grendel|khan 09:41, 29 May 2006 (CDT)

Prioritizing ISFDB Projects

Given the issues raised in the "Verification discussion" section immediately above, let's see if we can list the projects that we are currently working on and then prioritize them.

First, there is the basic cycle of software testing-correction-retesting, requests for additional features, their implementation, testing, etc. It's a never ending process and requires its own internal prioritization that Al is keeping track of on the Bugs/Features pages. Once we have improved support for pseudonym handling and content editing, we will likely have the basics in place and after a few weeks of testing we could probably enable submissions and see how it goes. This is our number 1 priority, but at this point Al is the only developer on the project and his time is at a premium. I may be able to help some once I get up to speed, but it's been centuries (well, millenia, really) since I had to learn new tricks in this area. And what was wrong with PDP-11 Assembly anyway? :(

The second project that immediately comes to mind is "general cleanup". This includes but is not limited to the following tasks, starting with the most cost-effective ones:

  1. Identifying variant forms of Authors' names and either merging or linking them -- highly important if we want all data for each Author to appear together -- easy to do via Directory and Author Merge;
  2. Ensuring internal consistency of Authors' ISFDB bibliographies -- very important since some ISFDB pages are effectively unusable due to repetitive and poorly organized data, e.g. see Richard_Pini. Even major Authors' bibliographies are often difficult to use due to these problems, e.g. Andre_Norton. We can start at the top (Wells, Verne and other lassics, then Grandmasters, then Hugo/Nebula winners) and work our way down the food chain;
  3. Building up basic "subjective" data, primarily series information -- closely related to the "internal consistency" task, although it can be more time consuming;
  4. Adding basic Author data to the Author records, e.g. Web site, WP entries, portraits, etc -- relatively easy to do, especially in WP's case, and the bang for the buck is pretty high;
  5. Identifying and eliminating non-SF Authors, Works and Publications -- time consuming, but quite useful since we don't want to confuse our users with unrelated data.

This is all about gathering the low hanging fruit first. Once we have gotten what we already have in a semblance of useable shape, we can then concentrate on the much more time consuming and laborious process of data verification, acquisition, etc. There are 20,000 volumes right here that are begging me to be entered :) Ahasuerus 13:09, 28 May 2006 (CDT)

I believe that agreeing how we are going to identify the stability/correctness of records is a higher priority than any of the above, for the reasons I've given. If we do that, then I think the correctness of records does not need to be much of a priority, since the process will (we hope) take care of it. Then I would say completeness is the next step, so my list (after the stability issue is dealt with) would be based around high profile authors, as you suggest. I would start with the most accessed authors (Steven King, Robert Heinlein, and so on), and a project page could be built to track the progress of each one. I'd try to recruit editors to these projects, and the sequence for each author would be very similar to what you've outlined. Because I'd make it author-project-focused, my list would look a little bit different:
  1. Merge and otherwise sort out multiple author records
  2. Complete title list, including variant titles, and make consistent
  3. Complete pseudonym list
  4. Complete publications for each long title
  5. Complete short work publications
  6. Complete award information
  7. Complete series information (may be tricky in this project approach since this can cross authors)
  8. Any remaining miscellaneous verifications
I think this is close to what you have in mind, though. If we work like this, the units of progress will be easy to assess. We can say "The top priorities right now are the King, Heinlein and Asimov bibliographies, and here are the project pages; the King page reports all titles entered and all known publications of long works entered; they are now working on the short works". In addition the project pages can track information on the number of publications for "their" author that are marked "correct". Mike Christie 14:08, 28 May 2006 (CDT)
I may not be fully capable of understanging the nuances of the correctness/stability discussion at the moment since I am under the weather, as it were. However, the main impetus behind the drive, as I understand it, is to avoid that "descent into mediocrity" that Wikipedia has been accused of. Bad data leads to more bad data and eventually the contagion spreads and corrupts everything. In the end you have the kind of mess that was James Gunn's Encyclopedia or, heaven forbid, Roberta Rogow's notorious Futurespeak:
Slan (literary): Superhuman successors to homo sapiens in a series of stories by A.E. Van [uppercase] Vogt, beginning in 1925 with _Galactic Lensman_.
Now, if that is the rationale for concentrating on correctness first, then I think we can do certain things in parallel with developing and implementing a way to ensure that the data is correct -- which can prove somewhat time consuming depending on which model we end up adopting. For example, I don't think we risk any significant data degradation by enforcing "internal consistency": Robert_Louis_Stevenson is currently credited with almost 150 "Jekyll and Hide" Works and a similar number of "Treasure Island" Works. Merging these Works would take very little time and have no impact on the uderlying Publications and their "correctness". At the time, it would make Stevenson's bibliography useful to the ISFDB users while now it's completely unusuable by anybody not willing to spend a couple of hours on comparing hundreds of Works. It would also lay the groundwork for further passes, which Mike describes above.
I have been experimenting with this type of housekeeping lately -- see, e.g. Lynn_Abbey. It looks like it may take a certain amount of self-discipline not to branch out into other types of data correction (variant title identification, series consolidation, etc) while doing "internal consistency". Whenever I did deviate from the straight and narrow, I made sure to make a note of the change and the source of the data in the Wiki, though, which should limit the potential for work duplication. Hopefully :) Ahasuerus 16:12, 28 May 2006 (CDT)


Linking and unlinking Publications and Works

Is there a way to modify existing links between Publication and Works? For example, the 5 publications listed for Lester del Rey's Best Science Fiction Stories of the Year really belong under their respective yearly Works for 1972-1976. Is there a way to establish a Work-Publication link for the right pairs and break the current links? Ahasuerus 08:38, 1 Jun 2006 (CDT)

The title-oriented unmerge app (which seemed like a good idea at the time, but tends to have all the subtlety of a nuke) will be transformed into a publication-oriented unlink app, which will basically unmerge a single puplication. Doesn't exist today, but might by Sunday (I'm really trying to focus on content editors - new content is now working, reminding me of how much work was involved in typing magazine data; currently finishing up the editing of existing content tools). So I should stop ignoring comments, proposals, and new bugs real soon now. Alvonruff 11:33, 2 Jun 2006 (CDT)

Content Editing

I've put up the apps for adding new content. I would avoid using them for a couple of days until I push more data through them. I've added two magazines and one anthology, and have found XML encoding errors each time (Whacky author names and titles that don't show up in casual testing). It's safe to use the apps for the metadata portion of the book as that's well-tested. Alvonruff 05:57, 6 Jun 2006 (CDT)

How's the content edit testing going? I've got some magazines lying around I'd love to try entering . . . . Mike Christie 07:32, 8 Jun 2006 (CDT)

I think I found the last annoying bug yesterday morning, and I've entered 5 more magazines since then without problems. So I'd say you should forge ahead. Some notes to make things easier:

  • When entering titles into a magazine, you don't need to enter a date for the story/essay, so long as it's not a reprint. That is, if you leave the date blank, it will default to the date of the magazine publication.
  • The apps use javascript to insert new title forms, and new author forms associated with a title. That means that if you submit, notice an error, and decide to go back to the data entry page, there's no guarantee that the javascript-added entries will still be there (seems to happen mostly to book reviews as opposed to fiction entries). May be browser dependent.
  • Keep in mind that I still don't have the ability to edit existing content online yet. Measure twice, cut once.
  • You can edit title and author misspellings by editing that individual title (just don't edit book reviews or interviews yet - you'll loose the subject authors). Alvonruff 05:55, 9 Jun 2006 (CDT)
Cool. I should get to it today or tomorrow; I'll enter a few and report back. Thanks. Mike Christie 07:01, 9 Jun 2006 (CDT)

How does one enter a pseudonymous author when editing content? I'm entering the February 1957 Super-Science Fiction, and Harlan Ellison has two stories in there, one under his own name and one as Ellis Hart. Do I just enter Ellis Hart?

Also, how do I enter the month information? Mike Christie 20:33, 11 Jun 2006 (CDT)

OK, I've entered it and submitted it but have not approved it. I see now that I could have entered the date as 1957-02-00; but how would I enter something like "Spring 1958" or "Spring/Summer 1942"? Also, the page numbering is 1-128, not counting the covers, but the editor, W.W. Scott, has an introductory page describing Emsh's cover art; his notes are on the inside of the front cover, so they're on page 0, technically. How do I enter that? Finally, what should I do with Emsh and Orban, who have full names I could use but who weren't using them here? Thanks! Mike Christie 20:43, 11 Jun 2006 (CDT)
Also, is it OK that the Moderate display for the new pub shows only the first item of content? Mike Christie 21:05, 11 Jun 2006 (CDT)
  • Pseudonymous author information will need to be entered later. First, there is the philosophy that people should be able to enter data without knowing a priori that a certain author is a pseudonym, or that a certain title is a variant. Second, the form is already fairly complex, and things will be even more bug-ridden if we try to add support for pseudonyms, variant titles, series information, and publication information of reviewed books.
  • Other than the year, I don't think the date is critical for magazines. I put the month in where applicable. It's not a good idea to use the actual publication date, as magazines have such long lead times that the actual publication year may be different than the one on the cover. The issue date, by convention, should be part of the title: Super Science-Fiction, Spring 1957.
  • If an article precedes the published page 1, then I'd leave it blank. The display algorithm will assign it to page zero if it has no page number, but won't display a page of '0' (which might look funky). Ordering will be preserved, so long as there aren't multiple articles before page 1.
  • For artists like Emsh, the name is more of a nickname than a pseudonym, so I tend to put his real name instead of the nickname. If we would like to track where he used the nickname, we can treat it as a pseduonym.
  • No, there should be more content than what is shown. You should get an entire table of everything that was entered, including reviews and interviews. I've checked the record in the database, and the moderator tool is showing it correctly. Could you enter a fake magazine (one that doesn't take much time to enter), with a few test titles and authors? Do a small number - like 5 short stories. The result should look similar to this. Alvonruff 04:59, 12 Jun 2006 (CDT)
OK, I'll give it a shot tonight. What's the difference between "ESSAY" and "NONFICTION"? Mike Christie 14:41, 12 Jun 2006 (CDT)
ESSAY - short form. NONFICTION - long form. Locus differentiates lots of magazine article types: column, contest, criticism, editorial, movie review, etc. These are all considered an ESSAY in the ISFDB (maybe we'll expand essays into a broader list if it proves useful). NONFICTION is reserved for books that are not fiction. So in a magazine, ESSAY should be used and not NONFICTION. Alvonruff 16:13, 12 Jun 2006 (CDT)
OK, I'm guessing that I know what the problem is. I entered the second item as an interior art item with no title, just a page number and an author. That seems to chop the list. What should I enter for title for the interior art -- the name of the story it illustrates? Though presumably this is a bug anyway; whatever correct behaviour is, chopping the list can't be right. Mike Christie 22:33, 12 Jun 2006 (CDT)
Yes, using the story title is the convention I've been using. HTML forms aren't very robust - they work by placing variable names in the environment of the called script. That script then has to specifically check for a particular variable. "Got a variable called title1? Yep. title2? Yep. title3? Nope." Typically, one variable is singled out as a sentinel, so that the script knows when its done looking for data. In this case, the title was chosen. I'll look into a more clever method. Alvonruff 05:11, 13 Jun 2006 (CDT)
That makes sense; and I wouldn't worry about fixing it right away -- with the convention that the story title should be used I think it's fine. Presumably for cartoons such as appear in FSF the convention would be just to use "Cartoon" as the title. Mike Christie 07:19, 13 Jun 2006 (CDT)


OK, I've now entered the February 1957 Super-Science Fiction. The only thing I notice that's odd is that the editorial on page zero did not go in. I left the page number blank, as you suggested. It did show up in the XML with a <Page> of \n. I assume that the type of EDITOR indicates editorial; that's what I used.

Other than that it looks great. Let me know if you see anything else wrong with it. Mike Christie 17:51, 13 Jun 2006 (CDT)

The only thing is to not use the 'EDITOR' type (see full explanation in the bug report); always use ESSAY for anything that's not fiction. I created a Wiki page for Super-Science Fiction, and I picked a starting tag of "SPRSF". So I changed the tags for your two issues to SPRSFFEB1957 and SPRSFJUN1957, and put references to them in the Wiki. Alvonruff
FYI, I have moved Super-Science Fiction to Magazine:Super-Science Fiction as part of a wholesale relocation and cleanup. Ahasuerus 18:48, 16 Jun 2006 (CDT)

Something's amiss with the stats.

Looking at the stats page, there are nineteen titles listed without a corresponding type (ESSAY, NOVEL, SHORTFICTION and the like). Is this a bug, or is something corrupt in the database? grendel|khan 21:14, 10 Jun 2006 (CDT)

They are the "titles" associated with magazines. It's a loose end that needs cleaning up, but won't cause any short term problems. Alvonruff 05:01, 12 Jun 2006 (CDT)
This problem is now repaired. Integrating a magazine now generates a title of type EDITOR for the editors. The old records have been fixed as well. (An EDITOR record allows us to group an editor's activities by year. The individual magazine issues for a particular year can then be merged into a single entry. See Gardner_Dozois for an example.) Alvonruff 16:17, 12 Jun 2006 (CDT)

Anonymous and unsigned magazine sections

What's the convention for things such as unsigned editorials, and "In Times To Come", and science snippets that aren't signed? If I can be completely sure of the authorship, I'd be tempted to put the name in, but I think this contravenes the bibliographic spirit of recording what you see. So should I leave the author field blank, or use "Unknown" or "Anonymous" or "Unattributed"? And is there a natural place to capture notes about who wrote these? I was thinking of the publication level notes, but that would look weird -- hundreds of issues of Analog, each with a note saying "Authorship of "In Times To Come" attributed to John W. Campbell, Jr." Might be best to say nothing -- we are primarily a bibliographic resource and only secondarily (if at all) providing research conclusions, after all. Mike Christie 08:52, 13 Jun 2006 (CDT)

In the past I've used "unknown". This seems semantically different to me than "anonymous" - an anonymous author is usually a deliberate act to hide the actual author's identity, while an unknown author means that people were sloppy or didn't care and we don't actually know. On the other hand, the editor probably wrote this stuff, but didn't want to overwhelm the ToC with a bunch of instances of his/her name. I think "unknown" is more accurate, but don't feel strongly about it. Alvonruff 12:31, 13 Jun 2006 (CDT)

Editing magazine pages

Now that magazine/collection contents can be entered, I have been reviewing the ISFDB magazine directory. The first thing that comes to mind is that it may be useful to add the ability to edit static magazine pages. For example, the Unknown page could benefit from hyperlinking to Campbell's Long Works bibliography as well as to the Galaxy and Beyond Fantasy static pages. Other, less prominent, static magazine pages need other kinds of cleanup, including typos, ugly table layouts, etc. Do we want to add "static page editability" to the list of requested features or are there technical problems preventing us from making these pages editable? Ahasuerus 09:25, 13 Jun 2006 (CDT)

Well, the intent is to move all the pages into the Wiki, so that they can be edited (so far about 30 of them have been moved. See for instance: Beyond Fantasy). The old static pages would then be tossed. In fact, in the end, there should only be a very small number of static pages - hopefully we can get it down to a single index.html redirect page. Are we talking about different things here? Alvonruff 12:35, 13 Jun 2006 (CDT)
Oh! Yes, I remember it now. I blame age and the peripatetic lifestyle :( The only concern that I have with Wikification is that at this point there are no moderatorial tools in the Wiki (that I am aware of). I guess we can always protect the "Magazine" namespace if things get out of control.
And speaking of age, I am overdue for certain medical procedures, so I may disappear for a few days or weeks, most likely some time after 06/20. With any luck, I should be back and better than ever :) Ahasuerus 13:15, 13 Jun 2006 (CDT)
I think some other folks have made some MediaWiki mods that allow moderators, but it may require upgrading to a later version of MediaWiki. I'll have to do some investigation on that. Good luck with the procedure. Alvonruff 13:58, 13 Jun 2006 (CDT)

What needs to be in place to go live?

Do we have a definition of what functionality needs to be in place before we can start opening up editing? Obviously we'll be looking to clean up any bugs we can find, and there are some editing functions we don't yet have that are fairly necessary, such as content editing for existing content. But do we have to have award editing? Pseudonyms?

If we don't have a definition, maybe it would be worthwhile to create a page or subpage somewhere to characterize what we need. Mike Christie 13:48, 13 Jun 2006 (CDT)

I'm thinking 1) finish content editing, 2) finish help documentation, and 3) clean up the bulk of the bugs we've found. Award editing is not a requirement (in fact I've only allowed David G. Grubbs to edit awards to date, and I'm only now entertaining the idea of allowing non-experts to edit that data). The rest we'll bring in as time allows. Alvonruff 14:03, 13 Jun 2006 (CDT)
That sounds about right, but I wonder if there is a way to make content creation a little easier? As far as I know (keep in mind that I have only entered 1 or 2 collections so far), if there are 10 essentially identical Publications of a colleciton (Adventures in Time and Space etc), you have little choice but to key everything in manually 10 times. That's a lot of wear and tear on the keyboard! I tried using IE's "Back" button after submitting the form, but it makes everything past the default number of stories (9?) disappear from the form. Do you think we could we add a "Clone this Publication" choice to the navbar? Ahasuerus 14:18, 13 Jun 2006 (CDT)
One more thing. After making some 3,000+ changes to the database, I would estimate that up to 30-40% of the records that I changed or added required 2 or more modifications. Since I have full moderatorial privileges, it wasn't a big problem, I just needed to approve the first change and then go back and tweak the data some more. However, since regular editors won't have that level of control, any data submissions that require multiple passes -- e.g. Non-Genre Works have to be submitted as a different type and then edited after the fact -- the process may easily stall half-way through since editors may not stick around for their submissions to be approved. And at some point a repeatedly frustrated editor may not be coming back at all. Moreover, when a moderator reviews a submitted half-baked record, in some cases he may not have enough information to determine whether to "Approve" or "Reject" it.
Given the concerns above, I think it would be desirable to make the baseline submission form capture as much data, including Work-level data (pseudonyms, non-genre, etc), as possible before going live. I realize that it would mean (a) quite a bit of work on the development end, and (b) a slight paradigm change: from capturing Publication level data to capturing both Publication and Work data up front. However, my experience seems to suggest that it would make the submission process go much smoother. Ahasuerus 14:56, 13 Jun 2006 (CDT)
Cloning will probably show up this week. Agreed; absolutely required. As far as capturing work/publication data at the same time, I understand the desire and need. I'm resisting due to the additional complexity, both from a UI point of view (we don't want the app to look like the cockpit of a 747), and from a software point of view (higher complexity = higher bug count). We'll need to think outside the box on that one. Alvonruff 15:18, 13 Jun 2006 (CDT)
I agree with content, help, and bugs as go-live requirements. What about the verification discussion? I am still fond of the simplified checkbox scheme I proposed at Stabilizing Bibliographic Data. Ahasuerus made some good points about the need to do other kinds of cleanup too, and I agree with those; but I am very concerned about our inability to see what publication-level cleanup we have already done. What's to stop someone thinking they need to verify the two issues of Super-Science Fiction I just entered, for example? I'm not saying my scheme is the only possible one, but some form of verification flag seems a must-have to me. Mike Christie 22:22, 13 Jun 2006 (CDT)
Yes. Forgot that. Put it on the shopping list. I don't think that will be very difficult to implement anyway. Alvonruff 04:41, 14 Jun 2006 (CDT)

Another thought or two about going live.

First, what about having a beta test period? I don't know if the software supports giving only some users update privs, but I think it would be good to give about ten users (maybe recruited from rasfw, or somewhere similar) access, and then see how it goes for a week or two.

Sounds reasonable to me. Not only would we have more people entering data in various strange and unexpected ways (thus finding more bugs before go-live), but we would also be in a better position to estimate the sustainability of the current Editor/Moderator model in full profuction mode. Approving one's own submissions is one thing, but approving hundreds of submissions, many of them garbled (based on prior experiences with ISFDB1), may prove to be orders of magnitude more time consuming. Ahasuerus 10:39, 15 Jun 2006 (CDT)

Second, and related to the above, how about trying to focus the beta test on the project organization? I can see several kinds of updates, some of which have been the topic of my stability comments, and some of which are the sort of work Ahasuerus and Grendelkhan have been doing. We could ask for beta volunteers interested in a given author, e.g. Heinlein, and ask them to focus on:

  • Finding a way to build a project page that manages the ongoing verification work. We might need to make some suggestions, but this is the project's job, not ours.
  • Performing verification passes on copies of the underlying publications, and recording the correctness flag (or however that is implemented).
  • Ensuring consistency of the abstract data -- pseudonyms, series, variant titles, no duplicate titles, and so on.
  • Verification against bibliographic sources, and a way to record any discrepancies.

This sort of targeted beta might enable us to get a handle on a recommended workflow for people interested in a given author.Mike Christie 07:57, 15 Jun 2006 (CDT)

Author-centric projects are likely to cover 75%+ of the work that will need to be done to derive Work data from Publication data. However, it looks like a good 15-20%+ of the work may be better organized via Series-centric, usually media-related (Star Wars, Dr. Who, Star Trek, Buffy, Xena, etc) but sometimes non-media (Thieves' World) projects. We may want to add support for Series biblio pages the way the ISFDB Wiki currently supports Author biblio pages. Ahasuerus 10:39, 15 Jun 2006 (CDT)
Good point. Then we could canvas for e.g. Heinlein and Star Wars volunteers for the beta. Mike Christie 10:53, 15 Jun 2006 (CDT)

Queries related to magazine entry

I just added the Dec 57 Super-Science Fiction, and updated the magazine page. I have two questions.

  1. Is there a way to avoid having to separately look up each story afterwards to see if it needs merging with another one? Is there a way to indicate that it should be merged as I enter it? If not, fine, but I want to make sure I'm not missing a short cut.
First the short answer: no, there is currently no way of automerging titles. Now for the long (and somewhat philiosophical) answer that's been tumbling about my brain for a while: Some months ago, Robert Reginald posted the following snarky remark in the Quicktopic forum:
   I'm sorry to say that the ISFDB database is almost wholly unreliable. In
   looking at my own entries, there are multiple errors in dates, title
   information, and publication data--and the vast majority of my books in 
   the field aren't listed at all, although the information is readible 
   ascertainable. I've tried sending in corrections to this material in the 
   past, but I might as well be talking to a blank wall for all the response 
   I get. None of the mistakes are ever fixed. There's even a ghost title 
   listed. Pretty scruffy all around.
Reginald looked at his bibliography, found it to have erroneous and missing information, and attributed that state to sloppy work. When Reginald creates a bibliography, it is an enterprise of human intellect, whereby works are found and collated by applying human logic to the problem; the central work object in this case is the bibliography itself. In Reginald's world, bibliographies do not create themselves. In the ISFDB the central work object is the publication record, and we supply tools that allow those publications to be grouped together to form an author bibliography. In the ISFDB world, bibliographies do create themselves. We already have automated tools, in the form of Dissembler, that searches the web and finds new publication data records. The publication data found by Dissembler is, for the most part, correct (although there can occasionally be mangled author names). This allows a situation to arise whereby an author bibliography is formed without any human guidance whatsoever - in the case of Reginald, he was looking at an Accidental Bibliography that formed by distinct separate acts of Dissembler. It was a bibliography that had never been touched by human hands. While correct from a publication point of view, from the abstraction level of a summary bibliography it appeared scruffy because no human had ever applied knowledge and intellect to correctly research and organize the information. In fact, over the course of the last month, many records inserted into the ISFDB by Dissembler have been deleted, once humans looked at the records and thought: "I don't think that this title is appropriate for the ISFDB." We continue to add heuristics to Dissembler to reduce this effect, but having a machine autonomously do the work of a human bibliographer is really beyond the abilities of our current level of technology.
It looks like the problems that Reginald listed in his message can be attributed to a number of separate issues. First, there are problems with Publication level data that Dissembler captures. That's where "multiple errors in dates, title information, and publication data" (as well as the "ghost title" complaint) primarily come from. Based on what we have in the database right now as well as on my prior experiences with harvesting data (with and without AI assistance) from library catalogs, OCLC, Amazon.com and its competitors, used book sites, etc, the data that you get that way is very dirty. There are misspellings, ghost titles, mismatched ISBNs, illustrators listed as co-authors, incorrectly populated fields, etc, etc. In the end, as we all know, GIGO.
This seems to suggest that even for Publication harvesting purposes some form of human review of Dissembler's submissions (which I believe Al has been doing to a limited extent) would be advisable. Otherwise any records that do not have the "Correcteness flag" set will be of highly dubious usefulness. On top of that, since Dissembler creates new and usually imperfect Work level records based on new Publication records, it leads to Work level bibliographies slowly deteriorating over time -- which leads us to the next class of problems.
The second class of problems (which Reginald doesn't list in his messgae, but Al describes above) has to do with deriving Work level information from Publication information. I think we all agree that we can't expect Dissembler or other automated tools to be of much help in this area. Some of it can be trivial, e.g. the kind of "internal consistency cleanup" that I have been doing for the last month. Other times it can be quite time consuming and involve canvassing multiple bibliographic sources, Author/fan Web sites or even comparing text versions (see, e.g., the Note field for this Work) to determine the best way to link individual Publications at the Work level. There isn't much we can do to help this inevitably slow and painstaking process along -- aside from making it easier for editors to document their work in the Wiki.
Finally, Reginald ran into response time problems, but they were likely inevitable given Al's schedule and the need to concentrate on software issues as opposed to fixing the data. Once the editing tools are in place and a certain critical mass of editors/moderators has been accumulated, we should see significant improvements in this area. Hopefully :) Ahasuerus 12:09, 15 Jun 2006 (CDT)
So back to the merging problem. Sure, we could put in heuristics to try to automerge titles based on creating title, author, and title type equivalencies, but I'm concerned that we'd be handing over the intellectual process of creating a summary bibliography to the machines, which have proven themselves to be ill-equipped to handle the task. That is, it's difficult enough to program the machine to do *precisely* what it is we want to do to the database, let alone giving it permission to go off and decide by itself what to do. If someone, for instance, were to erroneously type in a title for a magazine, and that typoed title just happen to match another title by that author, then the tools would blindly merge the two titles together. Or worse, a title is mistakenly *published* with the wrong title, and the tools automatically merge the title, even when the bibliographer knows that they shouldn't be. Of course, a human *could* also erroneously merge them together, but the machine *will* erroneously merge them together. And although we have humans doing data entry on the magazines, it's still the machine that is grouping these new titles into bibliographies - and I think there is value in having some human intellect go and visit these newly-created Accidental Bibliographies and look them over. Because when I do that, I usually find some other error, unrelated to the title that was just entered.
Don't get me wrong here - I want the machine to do work for me to make the job of data entry easier and less error prone, and I think that your question is a valid one. On the other hand, I have personal feelings about the line in the sand that separates activities that should be done by us humans, and those that should be done by the machines. And for me, abstracting publications into a single work is something that ought be done by us, although I'm open to debate on the topic. Alvonruff 07:23, 15 Jun 2006 (CDT)
I completely agree. This is also, as it happens, in line with the distinction I was trying to draw in Stabilizing Bibliographic Data; a publication can be marked correct by a single person with a copy of the pub, but the abstracted properties (such as "Day After Tomorrow" is really a vt of "Sixth Column") should be marked verified by a collaborative process. I visualize, for example, that the Author:Robert A. Heinlein page will be a project page where individual editors can debate whether or not to do a merge, or whether a work belongs in a series. Each pub would have its own page on which we'd see the notes on why the "correct" flag was set; above that level it is the responsibility of the author project to manage the data.
This in turn implies some thoughts about prep for going live; I'm adding those above. Mike Christie 07:57, 15 Jun 2006 (CDT)
Good idea. One thought I have on the above topic is that it would be possible for the integration tool to do the check, and if it found a merge candidate, to submit a merge request, that would then be controlled by moderators as any other submission would. Alvonruff 09:10, 15 Jun 2006 (CDT)
Nice idea for a feature. I'll go ahead and stick a note about that in the feature list; I don't think we should try to get it in now (it's the code-freeze project-management Nazi in me). Mike Christie 09:14, 15 Jun 2006 (CDT)
Along similar lines, it occurs to me that it may be useful to be able to sort submissions by submitter. That way a Moderator can see all submissions by an editor/bot at the same time and make decisions based on the totality of the submissions. For example, "Change this Work's year from 1972 to 1973 and mark it as Correct" may look innocuous enough, but if there are 5 more submissions by the same editor sitting in the queue and some of them are clearly incorrect, then the moderator may reasonably assume that the editor is either incompetent or a vandal and act accordingly. Other useful features -- which, I hasten to add, don't need to be added right away :) -- would be the ability for moderators to add free text comments to submissions ("Rejected because XYZ", "Known vandal under a new user name, see the Talk Page/Block Log", etc) and possibly a "Freeze" flag to be used when a submission requires an in-depth review and the moderator doesn't want other moderators to approve/reject it until his analysis is done. Ahasuerus 10:04, 15 Jun 2006 (CDT)

2. I thought there was a merge button on the navbar but now I can't find it. Am I misinterpreting a script name? The only way I now know to merge pubs is to do an advanced search and choose merge from that.

We should merge titles, not pubs; that is, we merge because two publications are the same work. There used to be an old pub merging tool, but I took it off the navbar, as it was an ancient app that doesn't seem necessary. You can run Advanced Search to find the titles, or click on Titles in the author's navbar. Alvonruff 07:23, 15 Jun 2006 (CDT)
OK, thanks. It was just my bad memory. Mike Christie 07:57, 15 Jun 2006 (CDT)
If I find two or more identical publications, I simply delete all but one of them. The only catch is that you need to carefully compare their Contents data to make sure that you are not losing anything. More than once, I had to leave duplicate Publications in the database because they had slightly different Contents entries that we would like to preserve (e.g. editorials, illustrators, short fiction length information, etc) and since Contents data is currently not editable, I couldn't just add this information to the Publication record that I meant to keep. One way around it would be to create a brand new Publication record that incorporates all relevant fields from all related Publication records, but it's a lot of work and it will be much easier to do once we are able to edit Contents data. Ahasuerus 11:11, 15 Jun 2006 (CDT)

3. Take a look at the pending merge for Bloch's "Broomstick Ride". What's the difference between ss and sf? This can be fixed when we get around to adding help text or links to these screens, but for now I am not sure of the difference. Mike Christie 00:08, 15 Jun 2006 (CDT)

ss means that the SHORTFICTION title is known to be of length SHORTSTORY (as opposed to NOVELETTE or NOVELLA). sf means that we don't know what the length is. Many magazines denote the story length in the table of contents (Asimov's ToC is sorted by story length, with novellas first and short stories last), while collections and anthologies almost never do. The default is sf. Alvonruff 07:23, 15 Jun 2006 (CDT)
That makes sense; I fixed it to be ss in this case. Mike Christie 07:57, 15 Jun 2006 (CDT)

Another magazine entry query. I just added the Jun-Jul 1953 Fantastic Universe, and it has book reviews. When I add a book review, should I add an entry for the whole article: "Universe of Books, by Sam Merwin, Jr.", and also separately add a list of the books reviewed? Or is the main article redundant if I enter each reviewed book? I took the former course in this case. Mike Christie 00:25, 17 Jun 2006 (CDT)

It depends. If we're indexing a magazine like LOCUS, where there are a small number of reviewers, and the multiple reviews are organized by reviewer into larger articles, then I think it helps to enter an essay article to clarify and organize things. When there are a large number of reviewers, and each review is clearly organized into its own article, then I think it's redundant.
I think the guidline should be: if the reviewer reviews more than one book in an article, then the article should be added as a separate entry. If the reviewer reviews only one book in an article, then a separate essay entry would be unnecessary. Alvonruff 05:02, 17 Jun 2006 (CDT)

Correctness Flag - Part Trois

Grendelkhan has the following Publication sitting in the Moderator queue (for technical reasons, I seem to recall):

Title: The Star Beast Authors: Robert A. Heinlein Tag: - Year: 1954-00-00 Publisher: Ace Pages: 253 Binding: pb PubType: NOVEL Isbn: #78000 Price: $0.95 Artists: Steele Savage Image: [removed] Note: The only date given on the copyright page is the copyright date, 1954, which may not be the publication date

This is an example of a Publication that has passed "physical verification" and would be ready to have the "Correcteness flag" set if only the software was ready to support it. And yet we are not sure whether the publication date is correct. As a matter of fact, we are pretty sure that it's wrong because of the price and the catalog number. We will need to check other sources, in this case probably Tuck and/or the Paperback Price Guide, to determine the publication date. And it can be worse; I have seen pirated Israeli reprints with little to no Publication data and it's my understanding that some more recent Asian pirates are as bad as the old Israeli ones were.

Given that we try to make Publication data as "objective" as we can, what do we want to do in cases where Publication level data is either completely unavailable or can only be derived from secondary sources, making it in effect "subjective"? Similarly, will we ever want to set the "Correctness Flag" for editions that we are 99.9999% sure exist, but that we may not be able to physically verify? First editions of Thomas_More, Charles_Dickens', Jonathan_Swift, Honore_de_Balzac's or Alexander_Bogdanov's works come to mind. Ahasuerus 15:16, 15 Jun 2006 (CDT)

I think if I were entering that publication, I'd leave the date blank, and on the publication wiki page I'd say "Entered from actual copy; date not present". Then someone from the Heinlein project can come along and look at the title and see that that publication has no date. They go to the publication page and see it was entered from a copy; and they can decide what source to use to get the date. Suppose they find, e.g., an old copy of Locus that lists forthcoming books that describes this edition with its ISBN. They could enter that date and mark it correct.
I don't think it should be marked correct as it stands. The reason is that the flag really indicates both Correct and Complete. The only exception that occurs to me is that the cover art might not be credited -- do you leave it blank (or enter "Unknown") and mark it correct? I think you could legitimately do that. Mike Christie 15:33, 15 Jun 2006 (CDT)
I think I've tried all of these in the past (0000-00-00 for year indicates unknown year, using unknown for the cover artists) with varying results. Seems like it's the all-or-nothing nature of the verified bit that's tripping us up. What if each field could be verified? In that case Grendelkhan would mark the title, author, publisher, pages, binding, type, catalog number, price, and artist as verified. The year would be marked not-verified. That way we can at least have a year value which might be right - we're just unable to verify it at this time. People can then judge which data fields can be trusted, without waiting for every field to be verified - which may actually never happen. Alvonruff 17:13, 15 Jun 2006 (CDT)
Well, it would certainly take care of the year/cover artist problem. However, if we are going to take this route, then usability suggests that it would be helpful to have a "master checkbox" that indicates that all fields have been verified. The obverse side of the coin is that it may tempt users to check the master checkbox mindlessly, but that's always a tradeoff with UI design.
Also, what would be the criteria for eventually setting the Correctness flag for these fields? Consulting N+ bibliographic sources, where N is likely to be 2 or 3? If there is nothing in the Publication that unambiguously indicates what the value of the field should be, then we have little choice but to derive the information from some other source, either primary or secondary.
And finally, what do we do when we discover that the "objective" information in a Publication was incorrect, e.g. the year was off by 1 (it's been known to be off by up to 3+, believe it or not!)? I would think we would want to follow standard bibliographic conventions and record both the year on the title page and the year when it actually came out. Ahasuerus 17:40, 15 Jun 2006 (CDT)
Y'all are making me think too hard.
I have some misgivings about the multiple flags approach. I hate to add any extra complexity to the code and UI, for one thing; for another, I want the definition of the flag to be as straightforward as possible, and not subject to interpretation.
So how about doing this: the definition of the "correct" flag for a publication is that it accurately reflects the information obtainable from the publication. If the date is not apparent -- then you don't have to enter the date to set the correctness flag. If the cover artist is not specified -- put in unknown. So it should always be true that if you have a copy of the publication in hand, you can set the correctness flag; that's true by definition. Conversely, if you see that the correctness flag has been set (and you don't suspect malfeasance) then you don't need to get hold of a copy of the publication, even if there is missing data.
Any other data, such as publication date, true author (if not apparent), or cover artist, can be added without modifying the correctness flag. In those cases the publication wiki page should be updated to explain the source ("Freas's signature logo is visible in the painting so I am attributing the cover art to him", or "Tuck gives the publication date as 1963"). Some surprising things can be unclear for a publication; e.g. the editor of an anthology is occasionally not apparent. I am also assuming that when a publication is entered from a copy, the editor will update the publication wiki page with any relevant notes about missing data.
This approach has at least the benefit of being quite clearly defined, as well as being simple to implement.
I do suspect we will want to add other correctness flags later, but I feel we need to hold off till we see how the author-centric and series-centric projects shake out. I think it is going to be hard to take flags away, since they will represent work done by the editing community. I'd rather be minimalist till we understand the work flow a little better.
Mike Christie 21:01, 15 Jun 2006 (CDT)
Ah. You know you're finally starting to get somewhere in a project when participants start having discussions about machine intelligence or the nature of "truth". There is an elegance to your argument in regards to the knowable facts that can be surmised from the book itself. In the case of Grendelkhan's book, the "truth" about the publication date is not discoverable from the book - that data must come from elsewhere, perhaps resorting to digging through the publisher's company records. In a manner similar to Gödel's Incompleteness Theorem, there may be a considerable body of bibliographic truths that are not provable (as when information comes solely from a secondary source), as well as a body of truths that are not knowable (perhaps the only records concerning the true publication date were destroyed in a fire).
If we go with Mike's definition of verification, we are still left with the problem of describing which fields were verified. Let's go to a specific example:
   Title      <== verified via publication
   Authors:   <== verified via publication  
   Year:      <== determined by author's private notes 
   Publisher: <== verified via publication  
   Pages:     <== verified via publication
   Isbn:      <== verified via publication  
   Price:     <== verified via publication  
   Artist:    <== determined by artist-specific bibliography  
In this case, we should somehow communicate to the audience that while the publication was used to verify many data fields, it was NOT used to verify the year nor the artist, and in fact that data came from a secondary source, and is therefore potentially less authoritative. Alvonruff 22:02, 15 Jun 2006 (CDT)
Right. I'm assuming that in this case, there is a page named Publication:XYZ123, which contains something like:
Year obtained from personal conversation with author I met at a convention. Artist from "21st Century Foss", p. 74. Everything else is taken from a copy of the book. <signed> Joe Editor.
The "correct" flag should have been set by Joe when he entered the record. Those publication pages would probably be linked to by the author project pages, and will no doubt be grouped by title on those pages. Does that make sense? Mike Christie 22:09, 15 Jun 2006 (CDT)

OK, I went ahead and updated Stabilizing Bibliographic Data with some additional notes from the discussion above. I also made a fairly large update to Editing:Publication Records; this was in the Be Bold spirit, so please revert if I have made statements there that are incorrect. One question I have: do we still need the tag to be enterable if the magazine pages are no longer going to be static? It does need to be visible, but could it now be autogenerated? Mike Christie 23:42, 15 Jun 2006 (CDT)

1. I've added a Bibliographic Notes link to the publication listing app (see Jack of Eagles)
2. Large update = good.
3. The tags are currently autogenerated (except maybe for magazines). I'd like to be able to edit all fields somehow, just in case there's a bug which generates a bad tag - not that that's ever happened before :) . I could probably change the editor such that tag editing is only available to moderators. I'll think over the magazine tag proposition (mostly to recall all the reasons we had for maintaining a tag name space for magazines. I think Michael J. Cross was using them to automatically generate magazine directories on his web page). Alvonruff 05:45, 16 Jun 2006 (CDT)

Magazine Collision

It would appear that Fantastic and Fantastic Science Fiction Stories are one and the same. I suspect that the root cause of the problem is that we list all variant titles in the Wiki magazine directory without clearly identifying them as such. Ahasuerus 20:15, 16 Jun 2006 (CDT)

Help pages

The help pages are on the list of outstanding tasks before we go live, so I thought I'd start a discussion of what needs to be done to them. I suggest we agree on the functional goal of each page, and then come up with an article structure, or structure for a set of subpages, to guide our draft of any missing pieces of the help system. We have three main help pages as far as I can see; here they are, with my thoughts on a definition for the use of each:

  • ISFDB Editing Guide: Detailed "how-to" for performing all the tasks an editor of the ISFDB might want to do.
  • ISFDB FAQ: Answers any miscellaneous questions about the ISFDB. May also answer some questions that belong in the editing guide; probably does so with pointers to the answers in the editing guide.
  • Help:Contents: This will have pointers to the editing guide and FAQ, but should include a guide to how to contribute overall. For example, author projects and series projects will not be covered under the editing guide, but should be described here (and of course we don't yet know just how those will run).

When we've agreed on the function of these three pages (and that these are what needs to be cleaned up before we go live), I suggest we move discussions of completion of those pages to the talk page for each of them, just so the community portal doesn't become the all-purpose bulletin board. I'll go ahead and make some notes on those talk pages now, just to get started. Mike Christie 08:10, 17 Jun 2006 (CDT)

I created a simplified version of the help menu for the Help:Contents page. Any thoughts on whether this will do as the start point? Mike Christie 19:23, 18 Jun 2006 (CDT)
Looks reasonable to me :) Ahasuerus 21:39, 18 Jun 2006 (CDT)

GoLive-ToDo tag

I've created a template for tagging pages that need editing for the go live. It's called Template:GoLive-ToDo. So far I've just tagged one page; I'll try to go through the help pages and add some more. Please drop this in wherever you see something that needs doing that you don't have time to do right away. Mike Christie 11:59, 18 Jun 2006 (CDT)

Magazine pages doubling as project pages?

I've written some text at Help:Contents/Purpose that assumes that we can use the existing Magazine namespace as the place for people to organize bibliographic status on those magazines. However, the indexes to the magazines are there too. Does it make sense to have them both on the same page? Or should there be a separate MagazineProject namespace?

I think I'm OK with putting the biblio detail down below the magazine, but it would look a little messy. Mike Christie 21:46, 19 Jun 2006 (CDT)

Welcome template

I've created a welcome template (a stripped down version of the Wikipedia welcome), but evidently I don't understand comment syntax as it didn't work with the comment in there. I'd also like to fix it to automatically add the four tildes, but I don't see how to do that either.

To use it, type {{subst:Welcome}} and it will produce this:

Welcome!

Hello, Community Portal/Archive/Archive01, and welcome to the ISFDB Wiki! I hope you like the place and decide to stay. Here are some pages that you might find helpful:

I hope you enjoy editing here! Please sign your name on talk pages using four tildes (~~~~); this will automatically produce your name and the date. If you need help, check out the community portal, or ask me on my talk page. Again, welcome!

Mike Christie 22:34, 19 Jun 2006 (CDT)

Draft author project format

I've put together a draft page layout for an author biblio project, at Author:Robert A. Heinlein. Let me know if you have any improvements to suggest (or just go ahead and improve it). Mike Christie 20:55, 23 Jun 2006 (CDT)

Looks like a good start! We may want to add some links to the text, though, so that new editors would know what we are talking about when we say things like "Verify that the correct title is the parent". [wanders off for another recalibration] Ahasuerus 07:11, 24 Jun 2006 (CDT)
I'll let others tweak it for a few days. Some time next week I'll turn it into a template, and subst it into a few of the higher profile authors, if everyone agrees that would be a good thing to do. Mike Christie 13:39, 24 Jun 2006 (CDT)
OK, I am mostly back in business now (disregard any dangling wires, I am told it's nothing serious) and have reviewed the proposed Project Page format. Overall, it looks quite good and comprehensive. A few minor comments:
Verify each listed award is true -- that is, that the title did win this award. Should we change "did win" to "did win or was nominated for"?
Agreed -- I made this change. Mike Christie 06:22, 29 Jun 2006 (CDT)
Review other sources of data such as booksellers and enter additional publication data from those; do not set the "correct" flag on these. I would only use booksellers' online catalogs in special cases. For example, they proved to be very valuable when I was rebuilding S._P._Meek's non-genre bibliography, but for genre books a good rule of thumb is that if a publication is not listed in regular genre bibliographies, but is listed by a bookseller, there is a VERY good chance that it's bogus or at least partially incorrect. I suggest we come up with more specific guidelines in this area. Also, we have decided not to set the correctness flag for publications that have been found in bibliographies, but have not been physically verified, right? If so, then we may want to spell this out on Project Pages. Ahasuerus 23:47, 25 Jun 2006 (CDT)
I think the booksellers are mostly useful when a bibliography does not cover the author or period in question; I agree with your comments about unreliability when they disagree. If a bookseller lists a 50s Pyramid edition of Sheckley that is unknown to Tuck, it's probably an error. However, Tuck only goes up to '68, and Currey is only for first editions, and Reginald stops at 1991, and they all omit authors. I think getting details of a 1994 Charles Sheffield book from an online bookseller is OK.
Well, it is certainly true that certain eras are not as well covered as others. Tuck tried to list all the editions that he was aware of, but in some cases referred you to Bleiler. Currey and Reginald only list first editions and, as you said, stop at 1991. Contento has good coverage for anthologies all the way through 2005 (1984-2005 as part of The Locus Index to Science Fiction) and Locus tried to list all editions that they came across after 1983. OCLC Fiction Finder is still in beta, but has the bulk of modern SF and quite a few foreign language translations.
However, although no single source is perfect, if none of them lists a particular Publication while, say, used.addall.com does, then it automatically raises a red flag. Sometimes it means that the item in question is not a work of speculative fiction at all but rather an RPG supplement, a rule book, a comic book, a figurine, a map or something else entirely. Sometimes it means that the item is non-SF but does belong in the ISFDB because it's "non-genre" or "non-fiction". And sometimes it means that used.addall.com is simply reporting bad data from Amazon.com/ca/uk and/or other booksellers. Once we have a better understanding of these heuristics -- and I think I am getting somewhat better at red flagging bad data returned by used.addall.com -- we may be able to compile a set of rules for our editors on "How to use booksellers' online catalogs without getting taken to the cleaners". For now, caveat bibliographer :) Ahasuerus 16:54, 29 Jun 2006 (CDT)
With regard to the correctness flag, I'm not sure we reached consensus, though what you're suggesting is certainly reasonable. I suspect you're right, and we should at least start with a rigorous definition of the correctness flag as meaning that someone has seen a copy. But the various special cases we discussed may change that quite quickly -- for example, some information such as artist name may be taken from bibliographic sources and added to a publication that has the correctness flag set, but that doesn't mean the artist name was seen on the publication. Ultimately I think the project pages will be the forum to work this out on an author by author basis. Mike Christie 06:22, 29 Jun 2006 (CDT)

Administrator Responsibilities and SOPs

As we are contemplating going live in the foreseeable future, I think we may want to have a list of ISFDB Administrator SOPs. It's much easier to check your SOPs at 1:30am after a few beers (or whatever you chemicals of choice are) than to recall what you have and haven't done lately.

A few things that come to mind are:

  • Recent Changes Patrol (RCP). Review recent Wiki changes for spamming, vandalism and other nogoodnik activities. Should be done at least twice a day. Also helps to identify new users, who may be trying to contribute useful information, but are not familiar with the process.
  • Review Moderator Queue(s). We have only one queue at the moment, but that may change at a later time (I will make some suggestions later). We should strive to keep queue lengths down, but I am not sure what the best way of doing it would be.
  • Notify other Administrators on {Currently Undetermined Page} about gaps in your availability/coverage.
  • Archive any discussions that are both long and out of date. We will need an archival policy at some point. Some discussions may need to be abridged and converted into policy statements etc.
  • Add more as the software and the community mature. Ahasuerus 23:56, 25 Jun 2006 (CDT)

Merging Author names

When merging or moving author names, the associated Wiki article(s) also need to be merged/moved. Is this something that we would want editors to do or something that moderators should remember to do when approving Author merge/move submissions? Any way to automate the process? Ahasuerus 01:45, 29 Jun 2006 (CDT)

Making ISFDB Wiki data available off-line

Now that we have migrated a fair amount of supporting data (including the magazine directory) to the ISFDB Wiki, should we consider making it available offline as part of the main ISFDB backup.gz file or otherwise? I think there would be value to it, but I am not sure how we could best approach without requiring a full install of the Wiki software on the destination system. Or should we just rely on the Wiki pointers that are embedded in the ISFDB pages and hope that the Wiki will stay up 24 by 7 for the foreseeable future? And, um, we do have a backup of the Wiki in case the box it's running on crashes and burns, right? :-) Ahasuerus 17:32, 29 Jun 2006 (CDT)

The wiki pages are SQL tables that TAMU installed in the same database as the ISFDB. So when we do a full ISFDB backup, it includes the wiki pages. Since I'm off for a few days, I'll set up MediaWiki on the home machine and make sure that the backup of the wiki tables are sane. Alvonruff 19:20, 30 Jun 2006 (CDT)

More on auto-merging newly entered Works with existing ones

[Quoting one part of the discussion above]:

Is there a way to avoid having to separately look up each story afterwards to see if it needs merging with another one? Is there a way to indicate that it should be merged as I enter it? User: Mike Christie

First the short answer: no, there is currently no way of automerging titles. User:Alvonruff
One thing that comes to mind is that we already have a way of enabling humans to make these kinds of decisions as part of the ISFDB editing process. For example, when merging multiple Titles or Authors, the software will identify any fields that differ between existing records and ask the editor to choose the right one. Wouldn't it be possible to do something similar for New Data submissions? Search the database for Works that match certain criteria (Author, Title, Year, probably Work type) and present the resulting matches plus the "No, this is a new Work" option to the editor? One would hope that it should be relatively easy to do for "New Novels" and only marginally harder to do for New Collections and New Magazines. Ahasuerus 17:32, 29 Jun 2006 (CDT)

Deleting Author records

There are some Author records in the ISFDB that are apparently orphan, e.g. Frank_Robinson. My guess is that they had bibliographic data associated with them at some point, but it was eventually merged with other Authors' data, in this case Frank_M._Robinson, and at the time it happened the software was not robust enough to delete the resulting orphan record. I am sure I could identify these records offline by running a few SQL queries against the offline database, but once found, is there a way to delete them in the online database? If not, should there be one? Or should we not make this tool available to editors since it's potentially too powerful even if we make the submission process smart enough to check for any pointers to the record before we allow deletion (the way Title deletion is only allowed if there are no Publication records pointing to it)? Ahasuerus 17:29, 30 Jun 2006 (CDT)

Could you delete them by adding a bogus title and then deleting it? Mike Christie 17:59, 30 Jun 2006 (CDT)
That would be a reasonable workaround for now, but I've just tried it with Robinson and it didn't work. My guess is that the Robinson record has a Note associated with it. Author Notes used to be displayed within ISFDB1, but are no longer displayed within ISFDB2 since the assumption is that all free text Author data should go into Wikipedia. The old Notes records are still in the database, though, and can be retrieved with a simple MySQL query. I'll fire up MySQL and check it out later tonight :) Ahasuerus 19:33, 30 Jun 2006 (CDT)

Wrist problems

Less than 5,000 edits later, I am begining to develop wrist problems. I suppose it's an argument for automatically applying changes made by Moderators :) but for now I'll have to take it easy for a few days and see how it goes :( Ahasuerus 20:34, 5 Jul 2006 (CDT)

Stray Eddings publications?

I was updating David Eddings' bibliography earlier today and noticed that it was missing his recent Belgariad prequels/laundry lists, i.e. Belgarath the Sorcerer, Polgara the Sorceress and The Rivan Codex, as well as the first two volumes of The Dreamers. I was then able to find them in Publications, e.g. this one or this one. I assume they had all been entered correctly in ISFDB1, but didn't get converted to the new ISFDB2 format properly. Any ideas as to why Eddings may have been singled out? Well, aside from karma, of course :) Ahasuerus 08:30, 7 Jul 2006 (CDT)


They're present in the last version of ISFDB1, so it's either a conversion error or something has happened since. I doubt that it's a conversion problem as there is nothing peculiar about them. The real question is what's the best way to fix them such that the pubs don't need to be reentered. Actually, this might be easy - I'll see if I can whip something up before the content editing update tomorrow morning. Alvonruff 20:55, 8 Jul 2006 (CDT)

Content Editing

I have only edited the contents of a couple of Publications, but it looks good so far! I have noticed that when you "Add a Publication" to an existing novel, you can't edit its contents (e.g. foreword(s), afterwords(s), etc) and you have to go back after the submission has been approved and "Edit" the contents. Is this by design? Ahasuerus 21:14, 9 Jul 2006 (CDT)

Just a note to say I won't be able to do much with the content editing; I'm on vacation for most of the next three weeks. I'll have intermittent access to the net but I don't expect to have time to do much work on the ISFDB till August. Have fun, fix all the bugs, and see you all later. Mike Christie 09:45, 10 Jul 2006 (CDT)
Enjoy the vacation -- hopefully you will encounter fewer bugs wherever you are going! :) Ahasuerus 10:11, 10 Jul 2006 (CDT)


Editing omnibuses

I was editing this Publication record, which contains 2 novellas by 2 different authors, and changed it from an omnibus to a collection. See Amazon.com for details. Everything seemed to go through, although I couldn't specify that the first short fiction piece was a novella since there is no drop down list for it if the original Content record is not short fiction. However, when I pull it up now, it didn't have the right contents :( Ahasuerus 12:44, 21 Jul 2006 (CDT)

Something is definitely not kosher with editing. I changed this Publication from a novel to an anthology since it contains 3 novellas by 3 different authors, one of them about time travel. Now it's apparently a Stray Publication, which you can see here. You can't see it at all on the Long Works list for Susan Sizemore since Stray Publications are only listed if the Author in question has no legitimate Long Works. Is that by design? Ahasuerus 17:56, 21 Jul 2006 (CDT)
Similar problems encountered while trying to convert P. C. Hodgell's Dark of the Gods from a novel to an omnibus that contains 2 novels. All kinds of things apparently got broken in this case. Leaving everything as is for your debugging pleasure :) Ahasuerus 18:32, 21 Jul 2006 (CDT)
Two different kinds of problems here. One is now fixed, the other is more interesting. Problem #1: when adding new content, a pub_content record was only inserted whenever a page number was present. Everything I've been doing lately has page numbers, so I haven't seen that one before. That one is now fixed.
The other problem is more subtle, but has a workaround. For every NOVEL, OMNIBUS, ANTHOLOGY (etc...) there is also a title record. So for an ANTHOLOGY publication that contains 3 novellas, there will be 4 titles: the ANTHOLOGY title record associated with the publication and the 3 SHORTFICTION title records associated with the novellas. In the case of Tall, Dark, and Dangerous it originally consisted of a NOVEL publication record and a NOVEL title record. When the publication is displayed for editing, the publication record data is at the top, but more importantly - the NOVEL record is shown as content. If one were to change the NOVEL title record to a SHORTFICTION novella, and then add the two subsequent SHORTFICTION novellas, what we have is a ANTHOLOGY publication record with no associated ANTHOLOGY title record (since the title record got blown away as one of the novellas). In fact, if you edit the publication record and add an ANTHOLOGY entry back into the content, it all works as expected.
So the question arises: why do we show the associated title record when that can lead to the kind of confusion that occured above? Well, we generally don't, but it isn't so straitforward as one might think. For magazines, suppressing the EDITOR title record is trivial. One might be tempted to say: "let's just suppress any NOVEL, COLLECTION, or ANTHOLOGY records from the content list. Then when editing a NOVEL publication, there won't be any confusion about a NOVEL title record showing up in the content." Okay, that's great except for an OMNIBUS, who's contents will consist of NOVEL, COLLECTION, or ANTHOLOGY records. So then we can get clever and say: "how about suppressing the title record whenever it's type matches the publication record?" And in fact there is such a rule for COLLECTION, ANTHOLOGY, and OMNIBUS, but not for the NOVEL. The reason for this is that in a collection, the collection itself will never appear in a table of contents for the book. But in the case of a NOVEL, it happens that there may be a preface, or second preface, or foreword, or introduction by another author that we would want to list in a contents listings. So we would expect to see "Introduction", "The Novel", and "Afterword" all listed as part of the contents - and to be fully editable. So the problem here is that the work was supposed to be an ANTHOLOGY (which would suppress an ANTHOLOGY title record during editing), but was misclassified as a NOVEL (which shouldn't suppress a NOVEL title record), and the associated NOVEL record was replaced with a SHORTFICTION record, and since the publication had no valid bibliographic type associated with it, it appeared as a stray publication.
So valid workarounds are:
  • When the publication is a novel, and one wants to change it to something else, first change the publication and title types, and then add the content.
  • When the publication is a novel, make sure that the associated title record is preserved, always adding additional content, and not replacing it.
  • Go ahead and blow away the title record by replacing it with content, and just keep in mind that the record can be fixed by simply adding the title record to the content later (although it will show up as a stray until it's fixed).
As far as a "fix" goes, I think the editing app is doing the correct thing by showing the NOVEL title as content in a NOVEL publication (otherwise how can we replicate a NOVEL table of contents?). So any fixups will have to be done when the data is integrated. I'll have to think about how that might be accomplished, as I don't like the tools adding in titles under me. Perhaps the wisest path would be to detect the error (no valid bibliographic title record associated with the publication) and bail out - leaving it up to the user to fix it. Alvonruff 07:27, 23 Jul 2006 (CDT)
Interesting. Let me sleep on it. :-) Ahasuerus 20:04, 23 Jul 2006 (CDT)

Dissembler Gaps?

It would appear that Dissembler missed this book as well as about half the entries in this series. Any idea what may have caused the gaps? Ahasuerus 17:16, 21 Jul 2006 (CDT)

And while I am thinking about the conundrum above, any idea why Dissembler picked up this mundane book? Ahasuerus 20:19, 25 Jul 2006 (CDT)
The ways of Dissembler are mysterious. Actually, there will probably be some gaps as I haven't been running Dissembler full out for a few months now, just using it to fill in a specific author's bibliography. I'm going to significantly revamp Dissembler into a full standalone tool that integrates data into a separate database, and utilizes a name grammar and better rules to reduce errors - and the resulting database will then generate submissions into the ISFDB. It will still be possible to pick up strays if the original source has miscategorized a book. We should be able to reduce the number of missing titles by widening the number of sources. Alvonruff 20:32, 24 Aug 2006 (CDT)

Graphic novels -- the final solution

I was about to use my flamethrower on Wendy_Pini's graphic novels when the sheer volume of publications and titles that I was about to delete made me pause. Are there any exceptions to the "no comics or graphic novels allowed" rule? What about graphic products that are at the very core of a given writer's work, e.g. Neil_Gaiman's? Just checking before I zap something that may take a while to rebuild :-) Ahasuerus 22:28, 12 Aug 2006 (CDT)

NESFA Index - Should we try to link up or exchange data?

Many years ago I provided Al vonRuff with a dump of the NESFA Index as it was then.

The contribution was duly noted in the list of major contribitrs :)

It has been greatly expanded since then and we're working on a major update to the schema. It seems a waste not to try to coordinate in some way. Any interest? --Mark Olson 21:40, 19 Aug 2006 (CDT)

Al is usually very busy during July and August struggling with the infamous "real life" monster, but I am sure he will respond when he comes up for air. For now, a copy of the ISFDB database is available on the ISFDB Downloads page and anybody can grab it. You can derive the schema from the MySQL database itself or get a nice .png file here. There is also a somewhat out of date (?) English description at Database Schema.
I'll look at that as a starter.--Mark Olson 22:41, 30 Aug 2006 (CDT)
I still need to wrap up bug fixing the ISFDB editing apps and implement verification support, but I think we should try to start reconciling the two databases in about 2 months. Alvonruff 20:23, 24 Aug 2006 (CDT)
Oh, and while we are at it, is there a reason why the NESFA list of recursive SF books doesn't mention Jerry and Sharon Ahern's The Golden Shield of IBF? Ahasuerus 19:06, 20 Aug 2006 (CDT)
Probably because Tony Lewis (who maintains that list) hasn't heard about it. I'll let him know.--Mark Olson 22:41, 30 Aug 2006 (CDT)

Order of display of contents of a collection

Are there any rules about the order in which the contents of a collection display? I.e. are they displayed in the order they are entered, or linked to the collection; or in order of unique id? If there's no rule, I won't worry about it; if there's is one I wanted to make sure it was working right and I followed whatever the convention is. It would certainly be nice if they could be made to display in the order in which they apppear in the work, but that's not important to add if the db doesn't support it now. Mike Christie 10:03, 24 May 2006 (CDT)

We're not imposing any order at present (we're getting them in the order that they're stored in the database). When I put up content editing (almost certainly next week), we will be able to add page numbers - if the record has page number information, it will be displayed in that order. Alvonruff 12:41, 24 May 2006 (CDT)

Prices

I'm beginning to think that the same ISBN on a book, and therefore the same publication, may have differing prices on it, especially if the book went through many printings over a long period of time. How can we deal with this? I'm pretty certain that many books (those Ace paperbacks not polite enough to include an ISBN, for instance) aren't going to include a printing date or printing number. The price field is for the cover price, right? But what if there are multiple cover prices? grendel|khan 14:07, 25 May 2006 (CDT)

Well, you have to keep in mind that ISBNs first appeared in Great Britain in the 1960s and didn't become a defacto standard on this side of the pond until the early 1970s, especially not in the world of genre fiction.
As far as prices go, true, publishers may not always slap a new ISBN on a second or third printing of the same edition of a book even though they may change the cover price. There is a fine line between an "edition" and a "printing" in the book world, but I think thaT for our purposes any visible changes (ISBN, publisher's code, cover art, cover price, month of publication, etc) should make it a new record. Ahasuerus 14:31, 25 May 2006 (CDT)

Right place for a process discussion?

I've been thinking a little about the future process by which the ISFDB data will be entered, verified and recorded. What's the right place for a discussion of that process? I'm thinking of something like a sub-page of the ConOps page, or maybe a page entitled "How bibliographies become stable in the ISFDB". The questions are around how we know verified records stay verified; what users can verify; what constitutes verification; and what a stable bibliography should look like and how it should behave. Any thoughts? I could post it here on the community page but that's becoming a general talk board and I was thinking more of a guideline page, with an associated talk page for the discussion. Mike Christie 15:45, 26 May 2006 (CDT)

Just start a new page. There are some tidbits that are scattered about the wiki already on the topic: discussions on objective vs. subjective data, how objective data (like publications) can be verified, how subjective data (like series information) can be more problematic, the locking of publication records that have been verified, and how many people does it take to verify a record? If you want to link it with the ConOps, that would be appropriate as well. Alvonruff 15:51, 26 May 2006 (CDT)
OK, I've started a page and an associated talk page at Stabilizing Bibliographic Data. Mike Christie 21:09, 26 May 2006 (CDT)

Templates added

I have created templates for Clute/Grant, Clute/Nicholls, Reginald1, Reginald3 and Tuck. Just enclose these strings in braces, e.g. {{Clute/Grant}}, and it will create a pointer to their respective Wiki pages. Ahasuerus 10:41, 30 May 2006 (CDT)

Grendelkhan has added another template, {{t}}, which does for Titles what {{a}} does for Authors. Ahasuerus 08:38, 1 Jun 2006 (CDT)
Ah, right, I should announce those, because they're useful. Also, {{p}} can be used to link to publications by tag: HRSMKRUP19XX, for instance. grendel|khan 07:23, 2 Jun 2006 (CDT)
I have just added {{LOCIS}}, {{OCLC}} and {{Sigla}}. I am not sure how often the last one will be used since Z39.50 searches can be time consuming and the retrieved data can be messy. Probably more of a niche tool. Ahasuerus 13:56, 2 Jun 2006 (CDT)

Putting magazines into their own namespace

Speaking of magazines, a quick look at http://isfdb.tamu.edu/wiki/index.php/Special:Allpages suggests that we may want to have a separate namespace for them. How about, say, "Magazines"? Ahasuerus 13:52, 2 Jun 2006 (CDT)

Yeah. That's a good idea. Alvonruff 14:19, 2 Jun 2006 (CDT)
All magazines should be in their own namespace now. Ahasuerus 14:46, 5 Jun 2006 (CDT)

Spamblock template

Do we have a spamblock template yet? If not, how does the message that I posted 2 minutes ago sound? Ahasuerus 14:45, 11 Jul 2006 (CDT)

YADP (Yet Another Devious Plan)

Some weeks ago, Mike Christie wrote above:

First, what about having a beta test period? I don't know if the software supports giving only some users update privs, but I think it would be good to give about ten users (maybe recruited from rasfw, or somewhere similar) access, and then see how it goes for a week or two.

I am thinking that now may be a good time to get some r.a.sf.w folks involved. How about the following devious stategery: Post a (not too prolific) author's name and request plot summaries for inclusion in the ISFDB? (With the understanding that their contributions will be treated like other user contributions to the ISFDB for copyright purposes, of course). If it goes well, we could make it a regular feature/thread and build up interest as people see that their work is being incorporated into the ISFDB. Perhaps it could help identify potential contributors and even possibly help rescue r.a.sf.w from the flamewar/offtopic quagmire that it so often gets dragged into? Ahasuerus 17:23, 29 Aug 2006 (CDT)

I ended up trying a slightly different YADP -- see http://groups.google.com/group/rec.arts.sf.written/msg/a39acfe56a8255be for details :) Ahasuerus 20:07, 14 Sep 2006 (CDT)

ISFDB break

I'll be net-deprived for a few days as I start yet another cycle of wandering.

When I come back, I may try to write a little script to find Locus Index publications that have no matches in the ISFDB. Need to use the old abacus skills before what's left of them turns into rust... Ahasuerus 21:29, 1 Sep 2006 (CDT)

I am back in business and have just updated the MySQL instructions page. I have also uploaded a basic "look for bad suffixes in Author names" script and the resulting hit list to a newly created Project page. Ahasuerus 23:21, 10 Sep 2006 (CDT)
Personal tools