Development/HTTPS

From ISFDB
Jump to navigation Jump to search

Application Software

  • The Python code has been updated to work under either HTTP or HTTPS depending on the value of a configuration setting.
  • URLs of all externally hosted HTTPS-compliant cover scans have been upgraded to use HTTPS.
  • The ISFDB database still has thousands of HTTP URLs of third party Web pages which need to be changed to HTTPS because their hosting sites have upgraded to HTTPS since the time the links were originally created. Each site needs to be evaluated independently to make sure that the structure of old, HTTP, links doesn't change when upgrading to HTTPS. At this time the URLs of the most popular HTTPS-compliant third party Web pages have been upgraded, but less popular Web pages still need to be reviewed and upgraded manually.
  • All URLs pointing to "www.isfdb.org" are now dynamically reformatted to use the current values of HTFAKE and WIKILOC and the current HTTP/HTTPS configuration settings. This enhancement should facilitate easier re-hosting of the ISFDB database using a different domain name under either HTTP or HTTPS. We still want to add Notes templates for publication IDs and title IDs to simplify data entry.

System Software

  • All modern Web browsers require that HTTPS connections be made using versions 1.2 or 1.3 of the TLS protocol. Versions 1.0 and 1.1 are no longer directly supported as of 2021 -- see this Mozilla article for details.
  • The live server is currently running Apache 2.2.8, which doesn't support TLS 1.2/1.3. The first version of Apache that supports 1.2 is 2.2.23. We probably want to upgrade all the way to 2.2.34, the final version of the 2.2 tree.
  • 2.2.23 and later apparently require OpenSSL 1.0.1 or later to run TLS 1.2 -- see "Changes with Apache 2.2.23" on the previously linked Apache Web page
  • The currently used version of Python, 2.5.4, doesn't allow HTTPS connections over TLS 1.2. This isn't a problem when serving ISFDB pages to Web browsers, but it prevents the nightly job from connecting to the SFE Web site -- see FR 1451 for details. We need to upgrade Python to 2.7.18, the last release in the Python 2 tree, to restore this functionality. Comprehensive testing of all ISFDB options under Python 2.7.18 has confirmed that the ISFDB software is fully compatible with Python 2.7.18.
  • It's currently unknown if the current version of the Linux Kernel, 2.6.32-042stab128.2, will need to be upgraded to support any of the upgrades listed above
  • A MediaWiki/PHP upgrade doesn't appear to be required to support the HTTPS migration and can be handled as a separate project. However, we need to test on the development server, which currently doesn't have MediaWiki/PHP installed. If we determine that a MediaWiki upgrade is required, we will need to address the following two dependencies:
    • MediaWiki 1.30 and higher requires MySQL 5.5.8 or higher, so we'll need to upgrade MySQL first. The development server has been running MySQL 5.5.17 for a numbr of years and there have been no issues.
    • Due to disk space constrains, we run the Python script wikitrim.py, which deletes old revisions of Wiki pages, on a regular (roughly annual) basis. The script won't work with MediaWiki 1.36, because rev_text_id in mw_revision isn't used any more.

System Software at isfdb2.org

Update - We're running a staging server at https://isfdb2.org, with the following versions:

  • Linux: 4.18.0-240.15.1.el8_3.x86_64 x86_64
  • Apache: Apache/2.4.37 (AlmaLinux)
  • MySQL: 8.0.26
  • Python: 2.7.18
  • PHP: 7.4.19
  • MediaWiki: 1.35.6

This system is running Almalinux (a Fedora Core distro intended for cloud operations), so updating components to a later rev than these is a bit complicated, but there is a convoluted process for moving to PHP version 8.0 if necessary. Otherwise, all preliminary tests are working well. Next project: MediaWiki

A note about current isfdb versions: Our hosting provider (FutureHosting) sold itself to Nexcess, and numerous transitions are underway. As such, I don't currently have a method for moving to a later version of Linux, although Nexcess has communicated that they would set up a test server sometime in the future. isfdb2.org is hosted at NixiHost, with 8GB RAM, and 125GB SSD storage. Nixihost has a much better reputation than Nexcess, so there exists a timeline where we get everything running at isfdb2.org, and then simply switch the domain name out when ready.

Update: isfdb2 is now running https. Since it is a full copy (except for images) feel free to test. Alvonruff 21:23, 12 May 2022 (EDT)

Discussion

Modern browsers like Firefox, Chrome, Edge try to access the HTTPS-port of a site first. User:Ahasuerus has done a lot of work to support a HTTPS-Implementation in the python code. Thank you for your effort. My own trunk of ISFDB runs this version.

It seems, that only the actual HTTPS-configuration is missing on the machine. But that's not the case.

I would like to point to more problems, which could be barriers:

  • The actual browsers accept only TLS versions above 1.1; "Mozilla, Google, Apple and Microsoft have committed to disabling TLS 1.0 and TLS 1.1 as default options for secure connections" (hacks mozilla.org)
  • nmap reports a linux kernel version 3.2: This is quite old, around 2012, I assume. I don't know the real implementation of [www.isfdb.org], but I doubt, that this OS has support for a newer TLS-Version greater than 1.0.x
  • ISFDB uses python 2.5 (correct me, if this isn't the case). I currently run python 2.7(released 2008).18 without any problems. Python 2.7 is EOL January 2021. ActiveState has assessed dozens of critical and high severity Python vulnerabilities impacting Python 2 to date.
  • Future OS might drop support of python 2; Python 3 (released 2006) is NOT backwards-compatible.
  • Actual MediaWiki release is 1.37. www.isfdb.org runs currently 1.12.0rc1
  • MySQLdb version 5.0. Actual MySQL version 8 is out. I'm using mariadb 10.3.24 which is compatible to mysql version 5.5
  • haven't looked at the HTTP-Server, but I assume, that it running an old patch level. MediaWiki >=1.36 requires internationalization extension in apache, according to release notes.
  • don't know, if the current letsencrypt will work on the system

So actually introducing HTTPS-Service would require (IMHO) the following steps:

  • upgrading OS to a current one including web-server
  • upgrading python to at least 2.7
  • upgrading MediaWiki to 1.3x
  • activating letsencrypt
  • working on upgrade-path of python 3 (currently 3.10)

I may miss a lot of other dependencies.

But there is although light at the end of the tunnel: Without any problems (I haven't looked in the fancy corners of all ISFDB-Features) current SVN revision 817 runs with python 2.7.18 on MariaDB 10.3, latest Apache 2.4.51, MediaWiki 1.36 (with small modifications)on my more modern OpenIndiana.

Any suggestions, ideas, what could be done on these problems? Any help needed?

Enough for today and get well soon! elsbernd 06:47, 28 November 2021 (EST)


Many thanks for the overview! I have been sicker than usual the last few weeks and I am still recovering, so I may have to respond in chunks. In no particular order:
  • I am aware of two potential issues with upgrading our MediaWiki software from 1.12.0rc1 to 1.3x:
    • The first one is the fact that MediaWiki 1.30 and higher requires MySQL 5.5.8 or higher, so we'll need to upgrade the database first. Luckily, it shouldn't be an issue since I have been running MySQL 5.5.17 on the development server with no issues. There has been only one case when a query that worked under 5.5 didn't work under 5.0 and it was easily fixed.
    • The second one is the fact that we don't have much disk space on the live server. I am forced to run the Python script wikitrim.py on a regular (roughly annual) basis. It deletes old revisions of Wiki pages, which frees up enough space and lets us continue to operate. We'll need to test this script under MediaWiki 1.30 to make sure that it doesn't mess up the tables. Alternatively, we'll need to get more disk space. Ahasuerus 18:19, 27 November 2021 (EST)
  • The TLS issue is complicated. We currently use an older version of OpenSSL, which doesn't support TLS 1.2. Moreover, Python 2.5.4 doesn't support outgoing connections over TLS 1.2, which is what prompted FR 1451 "Change the ISFDB software to work with the new SFE Web site". The last line in that FR's Description field reads "I have sent Al a message to see if he may be able to update our OpenSSL". I'll need to ping Al to see if he has had any luck upgrading OpenSSL. It may or may not require upgrading the Linux Kernel, which is also very old and needs to be upgraded anyway. Ditto our Apache server. Ahasuerus 18:38, 27 November 2021 (EST)
  • Upgrading Python from 2.5.4 to 2.7.18 should be relatively simple. Just change the configuration file to use the 2.7.18 executable and re-run all of the use cases, including the background jobs. Upgrading to version 3+ is likely to be a chore, but we'd have to run the automated upgrade tool first to see how much manual work will be required. Ahasuerus 18:49, 27 November 2021 (EST)
  • There is still some Python work that needs to be done before the software is fully ready to work under HTTPS. If you look at function ISFDBTemplates in common/library, you'll notice that many third party URLs start with 'http:' even though their respective sites have migrated to HTTPS, e.g. Deutsche Nationalbibliothek. We also have 114,719 HTTP links to third party Web sites and 411,344 HTTP links to third party-hosted images. Many of these linked third parties support HTTPS, e.g. we have 210,426 links to Amazon images. Ditto Wikipedia links, OCLC links, etc. Clearly, they need to be converted to HTTPS, preferably programmatically. If we don't do it as part of the HTTPS migration, many of our pages will be considered "mixed content" by modern browsers, which can cause problems. Ahasuerus 19:04, 27 November 2021 (EST)
Based on the list above, here is my tentative plan:
  • Things that I can do on my end:
    • Finish updating the Python code and auto-convert the 600,000+ HTTP links in the ISFDB database.
    • Test the ISFDB software under Python 2.7.18 and fix anything that may be broken.
    • Possibly install the MediaWiki software on my development server and test the upgrade path. It may take a while since I am not familiar with PHP/MediaWiki beyond tweaking our configuration files some years ago.
  • Ask Al to stop by and read this discussion. I lack the knowledge and the direct connection to the hosting company to do much about the Linux Kernel, the OpenSSL version, the Apache version or any other OS-level issues. I don't know why our versions are so far behind -- there may be dependencies and backward incompatibilities that I am not aware of. It's also possible that other hosting options may serve our needs better in 2022 and beyond. Ahasuerus 19:15, 27 November 2021 (EST)
Here are some other things to consider when upgrading MediaWiki. Currently our Python web application is fractured into a pile of separate CGI scripts (at some point these should probably be refactored into a single PEP-333 WSGI app that employs URL mapping by parsing INFO_PATH and doing different things based on the request, which would still allow us to host the app via CGI but also allow it to be hosted in other ways like via a FastCGI application server, etc. as well as also help with maintenance and security of the code; but I digress) and its authentication mechanism currently depends on its ability to directly access MediaWiki stored credentials in the same database. At some point we should probably adopt a different authentication scheme for the web app. I recommend something like OAuth via mw:Extension:OAuth (WikiMedia uses this heavily on its wikis). This would allow the app to still depend on our MediaWiki for login credentials while decoupling them from one another allowing them to develop/upgrade independently. I am not sure but I believe the app may also have a few other direct local MediaWiki installation dependencies too like the external note system for authors, publications, etc. (mostly in how the app detects and links to them in our wiki). Such a decoupling would also allow for them to run separately, possibly even on different server instances, etc. It should also be noted that our MediaWiki installation (despite also running an old release candidate version) is using two MW extension that are also quite dated (I am not sure about our copy of mw:Extension:ConfirmEdit but Special:Version definitely shows that our mw:Extension:SyntaxHighlight still depends on GeSHi when I know it switched to Pygments sometime ago). Anything the Python app does that directly depends on MediaWiki tables should seriously be looked at and likely decoupled when considering upgrading MediaWiki as such upgrades could seriously alter its tables and break the Python app's dependencies on such. —Uzume 21:14, 10 December 2021 (EST)

There is no need to act precipitately; please get well soon; Only some short notes:
  • The wikitrim.py stored in scripts/ won't work with MediaWiki 1.36, because rev_text_id in mw_revision isn't used any more.
  • Left a note to Al von Ruff; Maybe other user, who have discussed these issues, should be invited too?
  • Maybe someone could have a look on my mysql-scripts in isfdb-20211008.zip, updating all references in the (MediaWiki)-Database from http://www.isfdb.org to https://www.isfdb.org and references to other HTTPS-capable-sites, like amazon, dns, ..., which I identified during my own conversion of the database.
  • Although upgrading to newer OS/Services/Python... won't go smooth, even after intensively testing and wishes. But everyone will give his/her best. :-)
  • More space, more money. Don't know nothing about costs, running the ISFDB site. Is there any possibility/need to contribute?
--elsbernd 06:47, 28 November 2021 (EST)
Thanks for looking into wikitrim.py. We should probably start using native MediaWiki tools to save space. Manual:Reduce size of the database and Manual:deleteOldRevisions.php would be a place to start.
Also, I have checked our Apache version. We are currently running 2.2.8 and TLS 1.2 requires 2.2.23. It also needs OpenSSL 1.0.1 or later. We should probably create a list of identified dependencies. Ahasuerus 15:13, 28 November 2021 (EST)
I have created a preliminary list of dependencies -- see the top section of this Web page. Ahasuerus 16:52, 28 November 2021 (EST)

Merry Christmas and a happier new year; sorry for the delay, I took vacation for some weeks.
Every time I take a look at the current state of HTTPS, I detect more implications to solve. So I use this note to separate some tasks, which could be solved in parallel (IMHO)
  • Python HTTPS-code: Seems to be pretty solved. Code itself stays at current development version 2.5, but using 2.7.18 (latest) should be of know problem. That's one move, which could be done in the near future;
    • Python 2.7 is EOL; moving to 3.x (3.10) is more challenging. (see https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html). Can only be solved, if 2.7.18 is implemented
    • Database: Moving form MySQL-DB to newest Maria-DB-10.3.24 (on my server) can be evaluated in parallel. While moving to Python 2.7.18.
    • If I remember correctly, some users running newer versions of MariaDB/MySQL-DB without problems. Moving from MySQL to MariaDB is something, intended by the Linux Community.
  • OS: OpenSSL >1.1.l is needed for newer browser to stop their warnings.
    • Since actual OS seems not to support newer SSL-libraries (I can't prove that, but it should be obvious) an upgrade is needed for this. Additionally, OS isn't supported any more (I can't prove that too), so security issues are possible.
    • MediaWiki: Actually, the MediaWiki has changed substantially in the last years. With small changes in the python code, the newest could be integrated. But I think, this has to be tested more and maybe can only be done, if OS is upgraded.
    • I don't know if the current MediaWiki will run on the newest apache/OS, but I expect no problems. Nevertheless, newest MediaWiki will need newer apache versions, which is integrated within newer OS anyway. So first OS, then MediaWiki
So two major tasks have to be done: OS upgrade and within two steps, Python-Upgrade, which is more challenging. --elsbernd 13:23, 31 December 2021 (EST)

Just a quick note that I have been following along, but have been working on upgrading my infrastructure, including my 11-year old Linux system. Given the current supply-chain issues, it took awhile to receive the system (which was new enough that the Linux WiFi drivers didn't work). I have the latest Ubuntu 20.04 installed. Given the large number of dependencies listed above, I am trying to get the ISFDB working with the latest MySQL, Apache, and Python2.7 - which almost works. There are a number of issues discovered in moving to the latest, the thorniest issue being what to do about the MySQLdb layer, as it was abandoned 10 years ago, with the most suitable alternative requiring a move to Python3. I am currently documenting everything on my User page, if you want to see the current status. I see three major projects here (each with many subprojects):
* The move from HTTP to HTTPS. You guys have been covering this well, so I'll stay out of that. This project, however, is generating numerous requirements on the LAMP stack, necessitating project 2:
* The move of the ISFDB to a modern LAMP stack. This is likely a 3 phased project, and I'll work on this for a while. The first phase should theoretically enable HTTPS, but we would still be on Python2.7. So subsequent phases would be required.
* The move of MediaWiki to the latest version. I'm inclined to make a larger separation between MediaWiki and the ISFDB. The ISFDB has been using the login/password system of MediaWiki (because it was easy at the time), but that creates artificial dependencies when trying to move one system and not the other. So I'll look into a standalone user creation/login system. Then we can completely update the ISFDB, and make the MediaWiki project a separate project for later.
Alvonruff 07:08, 4 March 2022 (EST)
Thanks for working on the LAMP stack! A few thoughts:
  • I'll take a look at the SQLloadNextSubmission error on my development server. That section is a relatively recent addition to the code.
  • Japanese, Russian, etc characters are always displayed correctly even if the server sends data as UTF-8. The reason is that we store non-Latin-1 characters using HTML encoding -- see this FAQ section for more details. At some point we will want to move to UTF-8 and convert the HTML-encoded characters already in the database, but that will require extensive testing.
  • Python 2.7. A number of editors have run ISFDB locally using Python 2.7 and reported no issues. We probably need to do more formal testing to make sure that nothing is broken. One area where 2.7 may help is the nightly reconciliation with The Encyclopedia of Science Fiction, which is currently unavailable due to their move to HTTPS in October 2021. I think it would solve the problem even without the OpenSSL upgrade (which we need for other reasons), but I don't recall the details.
  • Python 3.0. I don't think anyone has tried it yet, but I expect that it will require a fair amount of work.
  • Upgrading MediaWiki is certainly a good idea and will help with security. The only compatibility issue that I am aware of is "wikitrim" mentioned above.
  • Decoupling the database from the Wiki. We are currently using md5 hashes, which are supposed to be "cryptographically broken", so we presumably need to move to something more secure.
  • Given the number of dependencies listed above, I think a multi-step approach would be our safest bet.
Ahasuerus 09:27, 5 March 2022 (EST)
Only a few notes, which have been documented already, but I'll like to recapitulate in short.
  • Upgrading to the newest MediaWiki involves two things: a) using the "new" hash system pbkd in submitlogin.py which easy to implement; I've documented the code in sourceforge b) update the script for manual adding a new user to MediaWiki; There are some layout changes in the html-pages, but that should be no big issue.
  • Upgrading the OS to the newest LTE including OpenSSL, apache, MySQL, MediaWiki and running the current system is the first step. And it's not too complicate. I'm running the system on a Solaris host. Nevertheless, I can't test all those hundreds of features of ISFDB.
  • Changing to python 3.9... is the real work. Moving to UTF-8 is tricky. Moving away from MediaWiki somehow could be although a later project.
Well, I'm currently learning python, so my knowledge of ISFDB is limited :-) So be merciful, if I'm a little bit greenly.
elsbernd 05:47, 12 March 2022 (EST)
Regarding a way to share authorization/login between the database and wiki, perhaps something like CentralAuth could be used? It's a Mediawiki extension, but it provides a way for multiple projects to use the same account database. It also allows for merging accounts, if that were ever needed. ···日本穣 · 投稿 · Talk to Nihonjoe 15:37, 17 May 2022 (EDT)
It also has a way to combine edit counts across projects, and track activity across projects, so it could give a more accurate representation of when a particular editor last did something without having to check both the wiki and the database. ···日本穣 · 投稿 · Talk to Nihonjoe 15:39, 17 May 2022 (EDT)
I am very happy to see the upgrade to MediaWiki, however, I am not sure about continuing to directly use MediaWiki for authorization/login. CentralAuth is not the way to go but something like OAuth is probably a much better idea and it is what WikiMedia uses for its Toolforge tools so it is well supported. BTW, now that we have newer MediaWiki and HTTPS, can we perhaps get some other updates like an update to the Interwiki map and perhaps consider some extensions like perhaps Scribunto? Thanks —Uzume (talk) 01:17, 7 September 2022 (EDT)