Close

February 29, 2012

Exporting Magento Search Terms for Google Site Search Synonyms

Magento’s internal search engine is really really rough around the edges. Its very inflexible and sometimes tedious to populate and maintain.

Thankfully, there is a very low-cost option from a vendor that knows a little bit about search engines. Google Site Search! Its easy to install within a Magento installation (and replace the default search capabilities), and takes a few minutes to configure.

The entire search is integrated directly into your website, so visitors will not know the wiser of where the search is actually coming from.

Now, if you have heavily utilized the “search term” feature within Magento and wish to carry them over to Google Site search, things can get interesting. The first issue is that there is no simple way to export your previous list of Synonyms from Magento itself. You will need to export the list directly from the database into a flat file. (We highly recommend our favorite tool, HeidiSQL for the job.)

Here’s the quick SQL query we ran to get our search terms from Magento.

SELECT query_text,synonym_for FROM catalogsearch_query

Make sure that you export the data to a UTF-8 file type, to maintain any special encoding you may have introduced within your Magento search terms (like & or ™ etc).

Now, once you have the list, you are going to have to format it into the Google Site search XML format. The format is basic XML but it has some VERY VERY important boundaries that you absolutely need to account for.

1.) You can only have 500 variants of search terms throughout your entire file.
2.) For each Synonym that you define, you can ONLY list 10 variants underneath it.

If you violate ANY of those constraints and you try to upload your XML to Google, you are going to get the dreaded (and meaningless): “Invalid description file format”. This dreaded red text can mean that one of a million things has gone wrong with your XML file, but most likely it is related to your violation of their key Limits

When building your Magento CSV to Google Site Synonym file, do NOT try and hand-build your XML. Use a module or plugin, or you will end up spending ridiculous amounts of time trying to figure out why strings are not encoding properly and/or why the file encoding types are not working.

Below is a script we wrote in our favorite data integration language, Perl. It expects a CSV file that was formatted based on the SQL query listed above. It will run through the list and build out a valid Synonym XML list that Google Site Search will understand and accept. Note that it will also write out an error log detailing any boundary conditions that are violated (Variant numbers). If you have any errors in the “errors.txt” file you are going to have to correct them in the final xml file, “lafin.xml”.

We really think that Magento is a very powerful shopping cart system, probably one of the most powerful on the market today. With a few tweaks, plug-ins and a little love, you can make it a lot easier to use and maintain.

Don’t hesitate to contact us for any Magento Website Questions or Issues! We have a ton of experience with all aspects of the Magento system.

Run.pl – A simple script that parses Magento Search terms and converts them into a valid Google Site Search Synonym xml file.

use Text::CSV_XS;
use XML::Code;
$file='export.csv';
	my $Data;  my $keys;
	$cnt=0;
  my $csv = Text::CSV_XS->new ({
    binary    => 1,
    auto_diag => 1,
  });
 
  my $variant_count=0;
  my $warnings='';
  open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file'\n";
  while (my $fields = $csv->getline( $data )) {
			$variant=$fields->[0];
			$syn= $fields->[1];
			if($syn eq ''){next;}
			if($syn =~ /http\:/g){next;} #no http terms;
		$keys{$syn}.=$variant.",";			
	}
   # Creating top XML node.
    my $content = new XML::Code ('Synonyms');
   $content->version ('1.0');
   $content->encoding ('UTF-8');
	foreach $key (keys %keys){
		$found_vars=0;
		@vars = split(/\,/,$keys{$key});
		if(scalar(@vars)>10){
			$warnings.="TOO many variants found for the search term $key(".$keys{$key}.").  Ensure there are 10 only per search term. \n\n";
			#next;			
		}
	   # Adding child node.    
		$sub_content = new XML::Code ('Synonym');
		   $content->add_child ($sub_content);
	   # set the syn name;
		   $sub_content->{'term'}=$key;
	    foreach $var(@vars){
			$variant_count++;
			#GOOGLE RULE:  Die if we have more than 500 variants TOTAL.
			if($variant_count>500){
				$warnings.="TOO many GLOBAL $variant_count found for the search term $key(".$keys{$key}.").  Ensure you have only 500 or less.\n\n"
			}
		  #print "VARS:$var\t";
		  $found_vars=1;
		  #add variant nodes
		  my $var_content = new XML::Code ('Variant');
		  $sub_content->add_child ($var_content);
		  $var_content->set_text ($var);
		}
	}
 
#write out our master xml file for Google
open (OUT, ">lafin.xml");
binmode(OUT, ":utf8");
print OUT $content->code();
close OUT;    
 
#write out errors we encountered
open (OUT, ">errors.txt");
binmode(OUT, ":utf8");
print OUT $warnings;
close OUT;  
exit;