Fresh news aggregated with Plone

Here is my kind of a HOW-TO documentation if you want to setup a portal
aggregating news from remote web sites (by the way of RDF/RSS
syndication or by the way of search engines “screen scraping”). Want
your own news aggregator and portal, hey ?
I will use the following products : Plone + ZEO + CMFNewsFeed +
CMFWebAgent. Personnally, I installed them on a Windows 2000 platform.
And I have to say that this installation process is rather long and
tricky… I would welcome a Plone distribution that would include the
right version of ZEO and the configuration of Plone as a ZEO client. It
should also include the XML library needed for CMFNewsFeed.

  1. Download everything you will need
    1. Download Plone 1.0.1 from http://prdownloads.sourceforge.net/plone/Plone-1.0.1.exe?download
    2. Download ZEO in its latest CVS version from within the ZODB
      3.3.1 CVS at
      http://cvs.zope.org/ZEO/ZEO/ZEO.tar.gz?tarball=1&only_with_tag=ZODB3-3_1-branch (because ZEO 2.0 cannot run with Zope 2.6.x which is included
      in Plone 1.0.1)
    3. Download CMFNewsFeed 1.1 and CMFWebAgent 1.0 from
      http://sourceforge.net/projects/collective
    4. Download the PyXML library, version 0.8.1 as a tar.gz file at
      http://sourceforge.net/project/showfiles.php?group_id=6473 (Note that the 0.8.1 exe versions are specific either to Python 2.1 or to Python 2.2
    5. Download my plone_conf.zip file which includes some config files
      I gathered mainly from CMFNewsFeed distributions I suppose.
  2. Install, unzip and move everything to its right place
    1. Install Plone to C:\Plone (do not ask Plone to start
      automatically and do not start it manually either)
    2. Add C:\Plone\Python to your environment variable PATH if Plone
      installer did not do it
    3. Unzip ZEO to C:\Plone\ZEO
    4. Unzip CMFNewsFeed to C:\Plone\CMFNewsFeed-r1_1 which you then
      rename to C:\Plone\CMFNewsFeed for more ease
    5. Unzip CMFWebAgent to C:\Plone\CMFWebAgent-r0_1 which you then
      rename to C:\Plone\CMFWebAgent for more ease
    6. Read C:\Plone\ZEO\docs\ZopeREADME.txt
    7. Move C:\Plone\ZEO\ZEO to C:\Plone\Zope\lib\python\ZEO
    8. Unzip PyXML-0.8.1.tar.gz into C:\Plone\PyXML-0.8.1
    9. Read C:\Plone\PyXML-0.8.1\README
    10. Within a commandline, go to C:\Plone\PyXML-0.8.1 and do a
      “python setup.py build”. You will run into some erros, but that’s not
      that important for our purpose
    11. Move all the files and directories included in
      C:\Plone\PyXML-0.8.1\build\lib.win32-2.1\_xmlplus to
      C:\Plone\Python\lib\xml, replacing every existing file (I know it must
      be a very dirty way to install this but I don’t know an easy way to do
      it better since I did not want to install a standalone python
      distribution outside Plone)
    12. Unzip plone_conf.zip file to C:\
  3. Startup ZEO and Plone
    1. Execute C:\Plone\1.start_zeo.bat
    2. Wait a few seconds (or more…) and check
      C:\Plone\Data\var\ZEO_Server.log to see if ZEO properly started (you
      should see several lines explaining that ZEO created a StorageServer,
      and so on)
    3. Set Plone’s emergency user with the Windows “Plone
      controller”
    4. Execute C:\Plone\2.start_plone.bat
    5. Wait a few seconds (or more…) and check
      C:\Plone\Data\var\debug.log to see if Zope (Plone) properly
      started
    6. Bring your browser to http://localhost then to
      http://localhost:8080/manage to see if Plone works properly and you can
      log into Plone management interface as your emergency user. It should
      work (well, it works for me…).
  4. Setup and start CMFNewsFeed as a ZEO client
    1. Go to http://localhost and register as a new user called
      “newsfeed” : this will be the username CMFNewsfeed uses for retrieving
      content from the Net and posting it into Plone.
    2. Log into http://localhost:8080/manage with your emergency user
      and give “newsfeed” the “Reviewer” role (go into Plone/acl_users, click
      on newsfeed and give it the Reviewer role). I suppose newsfeed should
      still keep its “Member” role.
    3. Open C:\Plone\Data\getnews.conf and set the member_name variable
      as ‘newsfeed’ (the default value is ‘rssfeeder’)
    4. Set a new RSS source as follow : go to http://localhost and
      login as ‘newsfeed’, click on the “my folder” link (in the navigation
      bar), ; then create a new folder : you select Folder in the list box
      and click on the “add a new element” button and fill in the form
      (“my_slashdot_source” as id/name and “My Slashdot RSS source” as
      title), validate. Then create a link into this new folder. It should be
      named ‘RDF’ (mandatory), it could be titled ‘the Slashdot RSS link’ and
      its URL points to the RSS file you want to be retrieved
      (http://slashdot.org/slashdot.rdf).
    5. Hack CMFNewsFeed to adapt it to Plone : open
      C:\Plone\CMFNewsFeed\CMFFeedApp.py and replace ‘Portal Folder’ with
      ‘Plone Folder’. Still in CMFFeedApp.py, find the line containing
      “_edit” and, just below it, comment out the “description=description,”
      line then add a “new_link.description = description” line below the
      “new_link.title = title”
    6. Open a commandline and get to C:\Plone\Data then execute this :
      “python C:\Plone\CMFNewsFeed\getnews.py” (or just run the 3rd .bat file I prepared in my plone_conf.zip file
    7. You should find your new news items under the “my_slashdot_source” folder. If they don’t display (but the getnews.py command line affirmed they were retrieved), it may be a ZEO cache issue. Quick and bad fix for this is restarting your plone. But, of course, you may have to fix your zope.conf file in order to avoid this kind of issue. For the moment, I don’t know how to fix that. I’ll try later.
    8. Schedule a ‘cmd.exe “C:\Plone\Python\python C:\Plone\CMFNewsFeed\getnews.py”‘ to run once a day (never run it more frequently than 30 minutes or you may be banned by the news sources) so that your news are fresh everyday. You may use a Windows version of cron to do this.
  5. Setup and start CMFWebAgent as a ZEO client
    1. OK. You are a big boy/girl now. So try and follow similar steps to make CMFWebAgent run. You may have to fix some CMFWebAgent search engines scripts since their web interface may have changed since CMFWebAgent (and this doc) were released. Dirty hacks on sight…
  6. Last but not least : please drop a comment here to tell me if this works for you, how hard it was to setup and so on… Or maybe you know of a better way to make these damned CMFstuffagents work !

Une réflexion sur « Fresh news aggregated with Plone »

  1. Ping : AkaSig » Web scraping with python (part 1 : crawling)

Les commentaires sont fermés.