Difference between revisions of "Talk:Development"

From ISFDB
Jump to navigation Jump to search
(template)
Line 1: Line 1:
See [[Talk:Development/Archive]] for archived discussions
+
{{isfdb-dev-talk-header}}
  
 
== Blank author pages created by reviews ==
 
== Blank author pages created by reviews ==

Revision as of 19:10, 2 October 2019

ISFDB Discussion Pages and Noticeboards
Before posting to this page, consider whether one of the other discussion pages or noticeboards might suit your needs better.
If you're looking for help remembering a book title, check out the resources in our FAQ.
Please also see our Help pages.
Help desk
Questions about doing a specific task, or how to correct information when the solution is not immediately obvious.
• New post • Archives
Research Assistance
Help with bibliographic projects.
• New post • Archives
Rules and standards
Discussions about the rules and standards, as well as questions about interpretation and application of those rules.
• New post • Rules changelog • Archives
Community Portal
General discussion about anything not covered by the more specialized noticeboards to the left.
• New post • Archives
Moderator noticeboard
Get the attention of moderators regarding submission questions.
 
• New post • Archives • Cancel submission
Roadmap: For the original discussion of Roadmap 2017 see this archived section. For the current implementation status, see What's New#Roadmap 2017.



Archive Quick Links
Archives of old discussions from the Talk:Development page.


1 · 2



Blank author pages created by reviews

There are numerous pages like this "Dr. Clifford Wilson"[1] that appear to be empty but are created by "reviews"[2], is there some way of having the review displayed on the page so it doesn't appear blank. I know why they exist but I'm sure most users don't. Thanks.Kraang 01:58, 1 June 2010 (UTC)

It's certainly possible, but it is not trivial since the software that handles the Summary Page logic is very convoluted and in places quite inefficient. After fixing some bugs in its guts earlier today, I am thinking that we may need to rewrite it almost from scratch, especially if we want to be able to continue adding more features without losing our minds. Easier said than done, of course, so for now I have created Feature Request 3009750. Ahasuerus 04:07, 1 June 2010 (UTC)
I too hate blank author pages, which is why we have "Reviewed Author" as an advanced search option now. (I don't know how people use the results: create the linked title or convert the review to an essay?) But there are authors that seem to resist deletion for other reasons I don't understand. Yet. BLongley 23:07, 1 June 2010 (UTC)

Putting a local copy at isfdb.local, not localhost

In order to work around a "feature" of Firefox that refuses cookies coming from localhost, I want to put my local isfdb copy on an address like isfdb.local -- but I can't figure out how to configure Apache (& whatever else I need to configure) to do that. Pointers? JesseW 04:12, 22 November 2010 (UTC)

And, answered. You need to put an entry in /etc/hosts -- i.e. "127.0.0.1 isfdb.local", and use that in the common/localdefs.py file. It may, or may not be necessary to also have a "ServerName isfdb.local" line in your Apache configuration. JesseW 04:26, 22 November 2010 (UTC)
You could also use IP multihoming by choosing a different IP since IPv4 allocates and entire class A for loopback. You could use say 127.127.127.127 or anything else in the 127 range. The most straight forward would be to use "127.0.0.2 isfdb.local" Uzume 05:30, 17 February 2011 (UTC)

Unicode escaping in HTML forms

I have been working on a couple of related bugs with HTML escaping lately. The first bug is fairly straightforward: in many cases, we are not escaping field values in HTML forms, so any value with embedded quotes or angle brackets will cause problems. The problem was first reported as a bug affecting publishers, but it's more pervasive than that.

The good news is that there is a simple solution to this problem: use XMLescape, which is already in effect for Title fields. The bad news is that this solution has an unfortunate side effect. Our MySQL database stores Unicode characters using XML encoding, e.g. "& #1055;" is used to represent the Cyrillic "П", and XMLescape escapes the leading ampersand character whenever Unicode is used. The result is that any value which includes Unicode characters will be overcoded -- see this Title Edit page.

I have a tentative solution that addresses both issues, but it is not very elegant and I would like to run it by other developers to make sure that I am not missing something obvious. Here is what I have done on my development server so far:

  1. Created a new function, HTMLescape. All it does is call XMLescape with a new optional parameter, "htmlencoding". This step is needed since we don't want to change the default behavior of XMLescape, which is also used in a number of other places.
  2. Modified XMLescape so that if "htmlencoding" is passed in, the ampersand character is not escaped when it's part of the &#NNN; or &#NNNN; pattern (what about &#xHH;?)
  3. Modified all HTML forms to use HTMLescape instead of raw values or XMLescape. It was relatively easy to do since printing is typically centralized in one place.

This approach seems to work, but it would be great if we could find an easier way to solve the problem. Ahasuerus 02:13, 4 January 2011 (UTC)

My only advice is do what seems to work for now. There are a variety of escape/unescape routines that cover strange corner cases, and at this point even I don't recall why there are so many, or which should be used in which case. Some of this comes from the HTML/XML world we live in, but the current approach is not elegant. Alvonruff 00:44, 5 January 2011 (UTC)
The solution is to serve the web pages and store data in the DB in a Unicode encoding like UTF-8. You can then still HTML escape ampersand, less than, greater than and double quote if appropriate. Storing data in XML/HTML Unicode entities for values outside latin1 (because the DB and the web pages are served as such) is not a good solution as the DB data will only be useful in an application that knows about such encoding (mostly web browsers) and it expands Unicode characters by a large factor. I have considered entering some Japanese translations of some books by a well-known SF author but have held off pending better Unicode support and the ability to credit translators (and audio books with voice credits, other derivative work support, etc.). Uzume 21:59, 3 February 2011 (UTC)
I think everyone agrees that a transition to UTF-8 is desirable in principle, but, based on others' reported experiences, it can be easier said than done. Personally, I don't know enough about MySQL to chance it yet. Perhaps a more experienced contributor will step up to the plate and make the necessary changes at some point. Ahasuerus 06:09, 4 February 2011 (UTC)
I might be able to work on that—but it won't happen over night. We shall seen. Right now I am still waiting to get Python CGI code. Uzume 18:36, 4 February 2011 (UTC)

"Public" URLs

Is there a definition of which URLs are considered publicly linkable and which are not? If not I would like to solicit feedback to create such a list. I ask because as a code developer, one can change what the URLs used to access the site are. In some cases this is not likely to be an issue like changing links within the edit script hierarchy so long as all the scripts are updated to use them because it is expected such URLs will only be accessed by people using the web site and it is not expected people will directly link into such URL space from other sites/applications. On the other hand, if I just went and changed the URLs for accessing pubs and titles to something else, I can imagine screams that the such code introduced a "bug" because external links would be broken. Help:Linking templates and Category:Linking templates provide some hints as to what sorts of URL's are considered external/public. Here is my initial list of what I believe should not be changed without serious consideration for backwards compatibility (i.e., it is a bug to change these without discussion and some implemented transition methodology):

URL Note
/cgi-bin/index.cgi Home Page
/cgi-bin/pl.cgi?<pub_id> Pub
/cgi-bin/pl.cgi?<pub_tag> should be considered deprecated in light of Bug 3153982 "Change pub links from tags to IDs"
/cgi-bin/title.cgi?<title_id> Title
/cgi-bin/ea.cgi?<author_id> Author
/cgi-bin/ea.cgi?<author_canonical> this is questionable but I believe many things still use this
/cgi-bin/publisher.cgi?<publisher_id> Publisher
/cgi-bin/pe.cgi?<series_id> Series
/cgi-bin/seriesgrid.cgi?<series_id> Series Grid for Magazines
/cgi-bin/pubseries.cgi?<pub_series_id> Pub Series
/cgi-bin/rest/getpub.cgi?<pub_isbn> Pub API
/cgi-bin/rest/submission.cgi Submission API

And here are some that I am unsure of:

URL Note
/cgi-bin/pl.cgi?<pub_id>+<concise> apparently supported but not sure if anyone actually links to this
/cgi-bin/pl.cgi?<pub_tag>+<concise> should be considered deprecated in light of Bug 3153982 "Change pub links from tags to IDs"
/cgi-bin/eaw.cgi?<author_id> Awards
/cgi-bin/eaw.cgi?<author_canonical> apparently supported but not sure if anyone actually links to this
/cgi-bin/ae.cgi?<author_id> Alpha
/cgi-bin/ae.cgi?<author_canonical> apparently supported but not sure if anyone actually links to this
/cgi-bin/ch.cgi?<author_id> Chrono
/cgi-bin/ch.cgi?<author_canonical> apparently supported but not sure if anyone actually links to this
/cgi-bin/seriesgrid.cgi?<series_id>+<displayOrder> apparently supported but not sure if anyone actually links to this
/cgi-bin/ay.cgi?<award_ttype><award_year> Award List

I know there were some URL issues when moving from tamu.edu but I was not around to know how that was. Are there others that should be included or that one should be concerned about? Thanks. Uzume 18:09, 4 March 2011 (UTC)

There's only 4 types of deep-linking officially supported according to this page - which is well out of date, but should be checked and updated. BLongley 18:26, 4 March 2011 (UTC)
Also, the Web API documentation does confirm getpub.cgi as supported - I suspect, but obviously can't confirm, that Fixer does use this to prevent some duplicate submissions. We might want to add some more features to the Web API - I know I experimented with the two options for "Data Thief" and found them a little lacking. BLongley 18:37, 4 March 2011 (UTC)
That's right, when Fixer builds a submission, he queries getpub.cgi to see if that ISBN is already in the live database. This is in addition to checking his local copy of the database, which can be up to 7-8 days behind the live version. Ahasuerus 19:10, 4 March 2011 (UTC)
OK, this is great feedback. I moved the getpub API up to the known unsafe list but that said there are not that many bots and they are all run by developers so though they have a certain amount of "unsafeness" they are not as big of an issue as true "deep links" like ones from Wikipedia articles, etc. Comparing the FAQ link BLongley quotes above to the list I created above, I added publisher.cgi, pubseries.cgi, and seriesgrid.cgi based on Template:Pubr, Template:PubSeries, and Template:IssueGrid. Should these be added to the FAQ or are these a gray area for now? And of course the two API URLs have already been discussed. I am planning on killing pl.cgi?<pub_tag> but that is a work in progress and will have to be around a while for transition reasons (though I believe I have some outstanding code that will help reduce the amount of new links people make based on pub_tag). Uzume 22:12, 4 March 2011 (UTC)

Status update

I am slowly working through the list of submitted fixes. I haven't found anything troublesome so far; half a dozen scripts have been deployed on the live server. Sorry about the slow pace, allergies make everything foggy and I want to test everything thoroughly before deployment. Hopefully, I will test/deploy the rest of the scripts early in the week. Ahasuerus 05:33, 7 March 2011 (UTC)

I am sorry about your allergies. Uzume 14:28, 9 March 2011 (UTC)

Bug 3153982 status

The change that started the Great Migration from tags to pub IDs has been tested and installed. Next, we'll probably want to change the tags that are displayed on the "Bibliographic Comments" line to pub IDs, e.g. this pub will have "(STRWRSFTFT2011)" replaced with "(pub ID 338434)". Ahasuerus 04:36, 9 March 2011 (UTC)

Yes, I have already laid some ground work for that hurdle but renaming all the wiki articles in Publication:<pub_tag> to Publication:<pub_id> is going to take me a while to do and I want to do that before I change the code or existing wiki comments for pubs will be broken for a while (which I want to avoid). In order to make the least impact for the wiki comments, I shall likely do a small patch for just that which can be quickly tested and installed once I get all the wiki articles moved. That will hopefully make it so few new wiki comments will be made during the window of time where I have renamed things and the code gets installed (potentially users could make new comments in <pub_tag> space). After that gets done, I might throw all the Publication:<pub_tag> redirects into a category so that a wiki admin can delete them. Uzume 14:24, 9 March 2011 (UTC)
I also want to implement a transition mechanism into pl.cgi that does something special with incoming links that use <pub_tag>s. I haven't totally decided what and tested it out yet but something along the lines of a browser redirect with a warning message flagging and telling people to change the source of the in-bound link, etc. Anyway, thanks for the patch. That clears the way for the next step (ripping out some <pub_tag> functionality: remove from search results, actually disable the clone functionality that has now been undocumented, etc.) and will go a long ways towards stemming the tide of new links using <pub_tag>. Uzume 14:16, 9 March 2011 (UTC)
Since I have undocumented using <pub_tag>s for cloning and will likely rip out that ability soon, can you assign me Bug 3050469 too since that will be rendered an moot point when I am done with that (and I shall try to make sure it handles any invalid values gracefully). Thanks. Uzume 14:56, 9 March 2011 (UTC)
I hope that not all functionality is going to go immediately - for instance, external sites that have linked via pub-tag will want a quick way to find the id to replace with. And remember that Cover Image Uploads are using Pub Tags to give a hopefully unique (but not guaranteed, see Bug 2795822) name. If anybody does try and reunite the Cover Image backups with the Database backups a Tag-to-ID conversion will be needed still. BLongley 17:05, 9 March 2011 (UTC)
Yes, I am well aware of the many implications and that is why it won't happen over night by any stretch of the imagination but I can implement things that discourage and warn about such usage in preparation for a time when we can eventually do away with such. As for the images, I plan to implement uploading by <pub_id> as the default name at some point (there are some details to be worked out before any code is actually changed) and images can technically even now be uploaded with any name and are linked to from the pub records via a complete URL so we have that mapping until such time as the URL is replaced and then it is a moot point anyway. I see no reason to enforce any image file naming (currently there isn't any either but there is a default name provided by the link in the pub record). If you are loading images with <pub_tag> naming and not submitting these as coverart updates to the pub records, then I do not have a solution for retaining such a mapping (but do we want to support that anyway?). Uzume 17:52, 9 March 2011 (UTC)

While we're waiting for Ahasuerus to rise again...

I utilised the latest backup to provide these previews of four of the next set of Data Cleanup scripts. (Part Three. The fifth only provided one result so I fixed that myself.) ISFDB:Series of Variants, ISFDB:Variants of Nonexistent Titles, ISFDB:Stray Interviews and Reviews, and the BIG one: ISFDB:Stray Authors2. The last looks as if I've coded something wrong and the list should be shorter, but at the moment I can't think why, so any feedback on that in particular would be good. BLongley 23:34, 23 April 2011 (UTC)

An interesting use of our data

This might be of interest to people. BLongley 19:43, 14 July 2011 (UTC)

Link no longer works. Gives a 404 error.--Astromath 14:58, 25 December 2012 (UTC)

Automatic notification of verified pubs.

I'm not sure how doable this is, but my suggestion is when a verified pub is edited, it will automatically notify all verifiers of the pubs on their discussion pages. The only problem I foresee is that some verifiers wish to be notified on special pages they've set up for that purpose.--Astromath 15:15, 25 December 2012 (UTC)

Yes, it should be doable, although I'll have to check the Wiki software to see how easy it will be to implement. I have created FR 3598471 for this feature request.
There is also a larger issue here -- we want to maintain a complete history of all edits in the database so that we could tell that, e.g., the value of field A was changed from X to Y by editor N on 2012-12-25 (FR 2800816). This information is currently captured (but not made readily available to our users) for author edits, but not for any other types of edits. Ahasuerus 21:37, 25 December 2012 (UTC)

That being said, if this does come to pass, then there will definitely be a need of a preview page prior to saving the edits. This is in case an edit is cancelled by the editor for some reason which would clutter up the verifier(s)' discussion pages.--Astromath 15:15, 25 December 2012 (UTC)

I would expect that the verifiers' Talk page will not be updated until after the submission has been approved. Ahasuerus 21:37, 25 December 2012 (UTC)

LCCN & OCLC fields

With the prevailance of LCCN and OCLC numbers, wouldn't having separate fields for them make sense? Then you wouldn't need to go through the hassle of html coding of the links in the notes field.--Astromath 01:38, 1 January 2013 (UTC)

Indeed, it would be desirable. We have FR 3127708, which reads:
  • Add support for external identifiers at the Publication and Title levels. Publication identifiers can include LCCNs, Worldcat IDs, British Library Ids, Goodreads IDs, etc. Title identifiers can include "Work" identifiers used by LibraryThing, Goodreads and other social cataloging Web sites.
Ahasuerus 02:16, 1 January 2013 (UTC)

Nightly optimizations

I thought I would move the technical discussions from the community portal. We should be able to optimize Unicode SQL searches used in reports 65–70 and 73–78 and from nightly/nightly_update.py 1.209 by updating common/library.py 1.139 line 1325 with:

1325def badUnicodePatternMatch(field_name):
1326        # Reduce the number of Unicode patterns to search for substantially by finding all "combining diacritic" combinations
1327        #   not just the ones we know how to replace
1328        # All of the keys are either a single numeric character reference (NCR) or a single character followed by single NCR
1329        # We only care about the single NCR so we remove the lead character of keys that do not start with an NCR "&#" prefix
1330        # And then we remove duplicate NCRs by pushing the list into a (frozen) set
1331        ncrs = frozenset(key if key.startswith("&#") else key[1:] for key in unicode_translation())
1332        patterns = " or ".join("%s like binary '%%%s%%'" % (field_name, ncr) for ncr in ncrs)
1333        # Optimize by finding all NCR prefixes (and throwing away everything else) first
1334        return "%s like binary '%%&#%%' and ( %s )" % (field_name, patterns)
1335
1336def suspectUnicodePatternMatch(field_name):
1337        ncrs = frozenset(['&#700;', '&#699;'])
1338        patterns = " or ".join("%s like binary '%%%s%%'" % (field_name, ncr) for ncr in ncrs)
1339        # Optimize by finding all NCR prefixes (and throwing away everything else) first
1340        return "%s like binary '%%&#%%' and ( %s )" % (field_name, patterns)

I wonder if these SQL generating functions are used in the application. If they are not, perhaps they should be moved into the nightly code. Now we just need to look at other things (like nightly/nightly_os_files.py 1.4). Uzume 14:43, 19 April 2017 (UTC)

They are also used in edit/cleanup_report.py, which is why they reside in common/library.py.
Re: the additional functionality, we may want to create new cleanup reports. The current "combining diacritics" reports look for string that should have been converted at input time. The new reports would look for combining diacritics which are not on the "translation" list and may need to be added to it. Ahasuerus 16:32, 19 April 2017 (UTC)
If you prefer the original behavior we can implement this for now:
1325def badUnicodePatternMatch(field_name):
1326        patterns = " or ".join("%s like binary '%%%s%%'" % (field_name, item) for item in frozenset(unicode_translation()))
1327        # Optimize by finding all NCR prefixes (and throwing away everything else) first
1328        return "%s like binary '%%&#%%' and ( %s )" % (field_name, patterns)
1329
1330def suspectUnicodePatternMatch(field_name):
1331        patterns = " or ".join("%s like binary '%%%s%%'" % (field_name, item) for item in frozenset(['&#700;', '&#699;']))
1332        # Optimize by finding all NCR prefixes (and throwing away everything else) first
1333        return "%s like binary '%%&#%%' and ( %s )" % (field_name, patterns)
It will still be a significant improvement if a somewhat less of one. Uzume 17:06, 19 April 2017 (UTC)
Thanks. Your code is functionally identical to the snippet that Marty sent this morning and which I incorporated a few minutes ago. I haven't touched the "suspect" reports yet. Ahasuerus 18:20, 19 April 2017 (UTC)
Ah and you forked the code nightly/nightly_update.py 1.211 1.212 diff). The advantage of fixing it in the original place is it also improves edit/cleanup_report.py (as you mentioned it is also used there). Uzume 20:55, 19 April 2017 (UTC)

Nightly cleanup reports - 2017-04-19 snapshot

Here is where we stand as of 2017-04-19. Only reports that take more than 2 seconds to compile are included:

Report 1 took 9.92 seconds to compile
Report 2 took 9.28 seconds to compile
Report 3 took 8.60 seconds to compile
Report 8 took 3.59 seconds to compile
Report 14 took 9.45 seconds to compile
Report 16 took 16.08 seconds to compile
Report 20 took 3.40 seconds to compile
Report 32 took 4.41 seconds to compile
Report 33 took 22.71 seconds to compile
Report 34 took 10.95 seconds to compile
Report 38 took 21.75 seconds to compile
Report 40 took 2.04 seconds to compile
Report 42 took 3.10 seconds to compile
Report 45 took 3.15 seconds to compile
Report 47 took 26.83 seconds to compile
Report 48 took 3.23 seconds to compile
Report 52 took 34.65 seconds to compile
Report 54 took 8.52 seconds to compile
Report 63 took 3.37 seconds to compile
Report 79 took 3.21 seconds to compile
Report 80 took 20.00 seconds to compile
Report 87 took 5.16 seconds to compile
Report 88 took 3.67 seconds to compile
Report 92 took 3.18 seconds to compile
Report 93 took 11.83 seconds to compile
Report 107 took 4.76 seconds to compile
Report 111 took 14.72 seconds to compile
Report 127 took 6.41 seconds to compile
Report 137 took 7.33 seconds to compile
Report 161 took 5.19 seconds to compile
Report 167 took 29.25 seconds to compile
Report 168 took 3.11 seconds to compile
Report 191 took 4.77 seconds to compile
Report 193 took 23.55 seconds to compile
Report 196 took 3.44 seconds to compile
Report 197 took 3.45 seconds to compile
Report 200 took 3.90 seconds to compile
Report 204 took 6.59 seconds to compile

Ahasuerus 18:24, 19 April 2017 (UTC)

97 and 99 are gone after adding USE INDEX. Ahasuerus 16:47, 20 April 2017 (UTC)
Do we have times on the complete nightly run (all the reports and os files, etc.)? Uzume 20:36, 20 April 2017 (UTC)
Not yet. I am working on Fixer's monthly haul at the moment. 4,000 ISBNs to go... Ahasuerus 20:53, 20 April 2017 (UTC)
Ouch! That just underscores the need to farm out the manual editing and submission process to ISFDB editors. Good luck (with this month's anyway). Uzume 01:20, 21 April 2017 (UTC)
151 has been zapped. Ahasuerus 03:44, 21 April 2017 (UTC)

WSGI

Moved from Ahasuerus's Talk page.

BTW, I noticed Schema:authors says we are using author_note over note_id. Would you care to explain the reason/history on that? Thanks, Uzume 18:54, 17 April 2017 (UTC)

The reason why all notes (and title synopses) were originally put in the "notes" table had to do with the origins of the ISFDB project. ISFDB 1.0 as created in 1995 was a hand-crafted C-based database. When the software was rewritten in 2004-2006 using Python and MySQL, some implementation details were carried over even though they were no longer needed. For example, the publisher merge code increments and keeps track of "MaxRecords" instead of using natively available Python functionality. Over the last 8 years I have rewritten much of the code, but some vestiges remain. Ahasuerus 19:24, 17 April 2017 (UTC)
That type of global code only works because we are using CGI and the entire application state dies at the end of every HTTP transaction. I do not mind the application being deployed in CGI but one thing I wanted to do was to be able to deploy it in other ways too. Python has a specified API for that called WSGI. I would like to move towards using that. That said, I wonder if it would be easier to rewrite the application that way starting with just record display and work our way up to full edit and mod capability. It could be deployed side-by-side (keeping the same DB back-end) in a beta sort of way until it was very stable and we could then retire the old one. Uzume 01:24, 19 April 2017 (UTC)
I remember you mentioning WSGI at one point, but I am not really familiar with the technology. What kind of ROI are we looking at here (both on the R side and on the I side)? Ahasuerus 01:48, 19 April 2017 (UTC)
Well it makes the application considerably more portable. If it was WSGI we could deploy it in non-CGI settings. CGI has poor performance compared to other deployment technologies because it has to create a new OS process for every HTTP access. We also have security issues with the way we have used CGI because we have many CGI scripts (basically one per URL which makes for a large attack surface to maintain often with redundant security code) and the few libraries are copied around into CGI space (they do not need to be there at all; they just need to be placed into one of the paths that the Python interpreter searches). There is the R. The I side needs some assessment. Python 2.5 was the first to adopt WSGI via PEP 333 and includes wsgiref a reference library so we have that (even though later versions might be more refined). We can convert the application into a WSGI Python library module and have CGI stubs that call into that to maintain CGI-based deployment capability. There will likely be some other issues though like global code that depends on the state of the application being recreated upon each HTTP access (e.g., I believe you pointed this out in the publisher merge code above). Uzume 05:12, 19 April 2017 (UTC)
Actually, the reference to the publisher merge code was just an example of our original Python code written the way you would write C code, which makes it hard to read and fragile.
As far as the benefits of WSGI go, they sound nice. However, there are many different ways to improve/reorganize the way the application is structured. Before we undertake a project of that magnitude, we'll need to compare alternative approaches and decide which (if any) are worth the effort. For now, especially given the number of features that are badly needed, my approach is "if it ain't broke, don't fix it." Ahasuerus 18:05, 20 April 2017 (UTC)
I do understand. But part of the issue is that the feature set is hard to add to and maintain because of the current (lack of) infrastructure. A cleanup could expedite development and thus possibly remedy the badly needed parts as well. A two pronged approach could be useful. Keep working on the current stuff and perhaps on the side also develop a possible replacement/uplift. Uzume 20:29, 20 April 2017 (UTC)
There comes a time in the life cycle of any system -- be it a module, an application, a car or a water heater -- when the maintenance costs rise to the point where it needs to be overhauled or replaced. It happened with ISFDB 1.0 in the early 2000s when it was replaced with ISFDB 2.0. It has happened with a number of ISFDB 2.0 modules, most recently the Advanced Search module, which I overhauled a few days ago. Eventually it will happen to ISFDB 2.0 as a whole. However, I don't think we are anywhere close to that point. I won't start spending any bandwidth on it until we are. Ahasuerus 02:06, 21 April 2017 (UTC)
I was not saying you should yet, however, often it is hard to see when is the optimal point until something comes along that underscores the issues. What I was suggesting is perhaps someone else (perhaps me should I have the time) take point on begin working on such and we can work together bouncing ideas off one another to either make that work better and/or backport some of the ideas here until the world gets better one way or another :) Uzume 04:30, 21 April 2017 (UTC)
Alas, "bouncing ideas" = "bandwidth". Ahasuerus 14:32, 24 April 2017 (UTC)

Another possible Fixer future

As a totally wild thought, you could build your own web application/website that is an interface to Fixer allowing users (you could possibly reuse ISFDB user credentials or not as you see fit) to look at and fix the collected Fixer data while submitting entries to ISFDB. In terms of the approval and editing part it would provide a similar (but web-based) interface you use. In terms of other items (like code and direct DB changes, etc.) it would of course be securely more limited. Anyway, it would allow people other than yourself to work the Fixer queues and would have the advantages of automated data collection and manual editing (and of course manual moderation at ISFDB). I am not sure all what Fixer does (I haven't looked at its code much) Uzume 23:58, 20 April 2017 (UTC)

Fixer's code is not public. It only runs on the development server and cannot be deployed to the production or any other server for various reasons. Ahasuerus 00:18, 21 April 2017 (UTC)
I assumed as much. I was just suggesting a Fixer web interface application somewhere (even if the development server is not publicly accessible). The data could be pushed to a publicly accessibly place (perhaps even the main ISFDB server). The point is to separate the processes of Fixer data collection and the ISFDB data submission (and farming this part out to ISFDB editors somehow). Uzume 01:16, 21 April 2017 (UTC)
I think there are five main steps in the process:
  1. Data collection
  2. Preliminary data cleanup
  3. Final data cleanup
  4. Submission creation
  5. Submission approval
Steps 1 and 2 are performed by Fixer. Steps 3 and 4 have been performed by me up until now, but Annie and I have been experimenting with User:Fixer/Public, a new public process. Once she comes back from Europe, we can finalized the process, at which point more editors may jump in. Hopefully. Ahasuerus 01:51, 21 April 2017 (UTC)
Right. I somehow doubt using a wiki page as a queue is an optimal process, however, it might be better in light of reducing your load. Uzume 04:22, 21 April 2017 (UTC)

but I know one of its main jobs is new pub submissions. It could be possible to add a method to prepopulate fields in edit/newpub and edit/addpub (we already do this for edit/newpub pub type with the construct "edit/newpub.cgi?Novel" but more could be added) and those features could be used by users of the Fixer interface application to push Fixer's collected and templated data into ISFDB's newpub and addpub forms for manual editing and submission (vs. only you doing the manual editing work and Fixer directly queuing submissions via the rest/submission web API). For example, the proposed new Fixer web interface could create links like http://www.isfdb.org/cgi-bin/edit/newpub.cgi?pub_ctype=Novel&pub_year=2017-04-10&pub_publisher=What%20Books%20Press&.... If the arguments became too long, you could use POST instead of GET querystring. Other possible functions of Fixer could potentially be handled similarly (e.g., edit/editpub could prepopulate from the database based on current data and then prepopulate based on Fixer provided form data allowing for proposed edits, etc.). The moderator note could be prepopulated with Fixer data letting the moderator know the origin of at least some of the data was Fixer. Users are good at editing ISFDB but it is hard to scour the world of publishers continuously to find all the latest publication data reliably. This is one of the main powerful features of Fixer. What I am proposing lets you have both. It would leave you free to work the ISFDB code and the Fixer code (to keep it in top shape collecting data) while having others use the data Fixer collects to create ISFDB submissions. Uzume 23:58, 20 April 2017 (UTC)

Nightly reports - live data

Here is the data from the production server as of 2017-04-24. The total elapsed time was 11 minutes. The threshold was 2 seconds:

SVG files (10 reports) took 53.32 seconds to compile
Summary stats took 2.23 seconds to compile
Contributor stats took 34.31 seconds to compile
Verifier stats took 10.41 seconds to compile
Moderator stats took 3.84 seconds to compile
Authors by debut date took 33.73 seconds to compile
1 took 23.85 seconds to compile
2 took 17.10 seconds to compile
3 took 16.83 seconds to compile
8 took 3.04 seconds to compile
14 took 2.10 seconds to compile
15 took 2.95 seconds to compile
20 took 2.83 seconds to compile
32 took 7.84 seconds to compile
33 took 30.65 seconds to compile
34 took 11.85 seconds to compile
38 took 2.44 seconds to compile
40 took 2.93 seconds to compile
41 took 2.41 seconds to compile
42 took 3.00 seconds to compile
45 took 4.15 seconds to compile
47 took 35.82 seconds to compile
48 took 3.79 seconds to compile
49 took 2.00 seconds to compile
52 took 27.75 seconds to compile
54 took 8.67 seconds to compile
58 took 2.04 seconds to compile
59 took 2.38 seconds to compile
60 took 2.85 seconds to compile
61 took 2.04 seconds to compile
63 took 2.79 seconds to compile
80 took 20.97 seconds to compile
87 took 4.43 seconds to compile
88 took 5.88 seconds to compile
92 took 2.05 seconds to compile
93 took 13.67 seconds to compile
94 took 2.66 seconds to compile
95 took 2.50 seconds to compile
107 took 2.37 seconds to compile
111 took 7.90 seconds to compile
127 took 3.95 seconds to compile
137 took 4.95 seconds to compile
143 took 2.10 seconds to compile
151 took 65.12 seconds to compile (already optimized)
167 took 2.96 seconds to compile
168 took 8.80 seconds to compile
169 took 2.19 seconds to compile
177 took 2.01 seconds to compile
182 took 2.69 seconds to compile
188 took 17.88 seconds to compile
191 took 3.63 seconds to compile
193 took 19.05 seconds to compile
196 took 2.81 seconds to compile
197 took 2.83 seconds to compile
204 took 3.41 seconds to compile


Code style: tabs vs spaces

Development#Code_Format asserts that "The code appears to use 'TAB' instead of 'SPACE SPACE SPACE SPACE' to indent the code." However, it seems that some files use spaces e.g. biblio/seriesgrid.py to name a file I picked by chance, which may well be atypical, although the SVN history indicates it's been around since at least 2017.

Also, if tabs are indeed the official style for indentation, what is the preferred value for tab stops, 4 chars, 8 chars or something else? (The aforementioned seriesgrid.py uses 8 space indentation, FWIW.)

I'm absolutely not trying to start a tabs vs spaces war, just seeking clarification. Whilst I definitely have my personal preference (spaces), I concur with the linked text that mixing and matching is the worst of all worlds.

(I should probably have asked this question before I spent an hour fighting my emacs setup to get it to automatically select tabs or spaces depending on what project directory the file being edited was in...) ErsatzCulture 18:28, 2 October 2019 (EDT)

It pains me to say it, but there is no real standard. Older modules tended to use tabs, newer modules tend to use 8 spaces. Some, like biblio/pe.py, use both, which, as you said, is the worst of both worlds. A few modules use 4 spaces. For what it's worth, my IDLE editor is set to use 8 spaces.
As a point of reference, back when the code was first made public (ca. 2008) we used CVS. The code was migrated to SVN in 2017 and that's what we have been using for the last 2 years. Pre-2017 history remains in a read-only CVS repository. I haven't touched CVS since shortly after the migration, so hopefully we won't need it ever again. Ahasuerus 18:56, 2 October 2019 (EDT)