Over the past month or so, I have been working on upgrading an Endeca system that currently processes about 150,000 records. The current system runs on Endeca MDEX Engine v5.1.3. The latest version at the time of writing this post, is 6.1.4A.
I decided to work on a VirtualBox Virtual machine, running Windows Server 2008 R2. I then installed all the Endeca components including CAS 3.0.0, Developer Studio 6.1.0, Platform Services 6.1.0, etc, on the Virtual machine. I used “Deployment Template 3.2” to create a new instance of Endeca for the upgraded system.
I amended the new AppConfig.xml to reference the same information as in 5.1.3. I didn’t overwrite the new AppConfig.xml, as I didn’t want to lose any version specific settings (such as XQuery – see below).
I then copied across all the pipeline configuration from 5.1.3 and opened the pipeline in Developer Studio. Developer Studio then upgraded the pipeline to the new version.
I tried an initial baseline update, but it didn’t work straight away.
The points I discovered whilst implementing the new system, are as follows:
- The “AppConfig.XML” now incorporates a script to load XQuery modules. I used a text compare tool to compare the AppConfig.XML file from 5.1.3 with 6.1.4A, and this was the most obvious difference.
- As the “Spider” component has been deprecated, I used the CAS Web Crawler to crawl a URL (used for reading reports).
- I configured the Web Crawler to write the output to a Record Store instance, using the procedures defined in the “Endeca Content Acquisition System – Web Crawler Guide – Version 3.0.0 – May 2011” documentation.
- I ran a full crawl, again using the CAS documentation, and then amended the Baseline Update script, to run the crawl:
- CALL \Endeca\CAS\3.0.0\bin\web-crawler.bat -d 1 -c c:\endeca\system\webcrawl -s c:\endeca\system\webcrawl\sites.lst
- “-d 1” refers to the depth of the crawl
- “c:\endeca\system\webcrawl” refers to the configuration folder (containing the recordstore-configuration.xml and site.xml files).
- “c:\endeca\system\webcrawl\sites.lst” contains a list of the site URLs to be crawled.
- I then added “Record Adapter” to the pipeline referring to the Record Store, as per the steps in the “Endeca Content Acquisition System – Quick Start Guide – Version 3.0.0 – May 2011” documentation.
- Make sure the Dimensions and Properties are set up correctly. It helps to have Developer Studio open with the original pipeline (i.e. 5.1.3), and the new pipeline (i.e. 6.1.4A). Check each property/dimension and make sure they are OK.
- Ensure the Property/Dimension Mappings are correct in the “Property Mapper” component.
- Ensure the “Combine records” is ticked in the “Record Cache” component (this might be dependent on how the data is extracted.
I then successfully ran a baseline update, and compares the results using the reference application.
I hope you find this useful. Please add any comments below if you need any further information.