Bugs on Europa Analytics

Not Bugs
However it may look like but, a lot of cases are not bugs.

Visits with no PageViews
There are sometimes figures for ‘visits’ when the corresponding row has 0 for ‘page views’.

The Europa Analytics is based on the IFABC metrics - which is de facto the industrial standard for web analytics.

According this metrics the analysis to count the hit as a PageView, it must return the status code 201 (or at least 20x) = success). This prevents from counting for example redirections (status code 30x family) as at this case user does not get any "real" content. In order to construct (and then analyse) the whole path of the visitor over the web the hits with all type of status code are counted in the visit (its length ect).

This may lead to the results look strange at the first sight.

An example: For http://personal.myweb/important.htm you have decided to launch a new site http://important.myweb/ and on the page Important.Htm you have set a redirection. Then if the navigation from http://personal.myweb/index.htm (201 - OK) (where is a link to /Important.Htm) to http://personal.myweb/important.htm (301 - Permanent Redirect) redirected to http://important.myweb/index.htm (201 - Ok).

Now if you analyse the traffic over the http://personal.myweb/important.htm, the PageViews will be always 0, while you will see certain number of visits.

But there are plenty of other possibilities (and we should not exclude also a possibility of a simple bug).

Prompts changes not applied in customised reports
While applying some changes to the semantics throughout various reports, it seems that it has had an impact on the customised reports where it was not automatically applied. It then gives an error message for these reports. To correct this and have the right prompt:


 * go to the "Edit" tab for the selected customised report;
 * right-click on the textbox to edit text;




 * check if the prompts appearing in the text (grey boxes) are also in the list of "Prompt value". If not, select the prompt to be replaced, delete it and insert the rigth prompt from the "Prompt value" instead.  Click OK and save the updated report.



The prompts currently concerned are:


 * "Select Webmart" replacing "Choose Site"
 * "Select Section" replacing "Choose Subsite"
 * "Specify URL" replacing "Choose URL"

Not all reports are concerned.

Combining data sets
To combine different data sets could mean
 * 1) combine 2 (or more different information maps or
 * 2) create new information map which would contain all wanted variables and measures.

The aims is to generate a report which combines Visitor Reports vatiables (Visitor Recency, Visitor Frequency, Countries, Organisations) with Page Usage Reports variables (Pages, Exit Pages, Bounce Rate) and Referrer and Search term Reports variables (Referrer Entry Pages, Top Referrer Entry Pages)

Archive (reports from SAS WH)

 * New explicit banner (asked to Johan the 04/10/2010)

Select Multiple URL

 * need to enable selection of more then 1 url

Report Pages

 * 100% problem
 * Users do not get 100% of pageviews (Information Map)
 * Users do not get 100% of visits (Information Map)

Have to set the enough high number of lines returned by page. (What is this? Why?)
 * Need, at least, to add explanation to the Prompt Screen

Report Countries

 * need to enable selection of URL

Report Visitors Recency
Get confirmation about the units to measure Average Session Duration - probably seconds

ea-ignore=true now works correctly (fixed since 19 feb 2013)
Webmasters using ea-ignore=true (to avoid urls counted in the statistiques) will see no difference in their statistiques till 19 feb 2013. The keyword ea-ignore was rejected by the ETL and thus could not be used to ignore or delete traffic as such. The only impacted key figure (statistic) is Page View. To estimate the effect we advise to compare the figures as from 19 feb 2013.

Dates before 01 jan 2010 listed in Available Data report (fixed since 28 October 2011)
Dates reported as 'earliest date' and 'latest date' in the Available Data report sometimes show dates before 01.01.2010. This is due to wrong web server logs provided by the system and processed by SAS Web analytics ETL. These dates have been removed from the webmarts. It doesn't affect the results of data later then 01 jan 2010.

Organisation Report (fixed since 5 October 2011)
The system update of 29.09.2011 caused displaying figures in the wrong format. So the report "Organisations" produced between 29.09.2011 and 04.10.2011 could display strangely looking results.

Since 05.10.2011 the format is fixed and report "Organisations" Reports works properly.

dynamic pages cleanup (Fixed since 1 September 2011)
Already in the previous system, called SAS Webhound (former Web analytics tool), dynamic pages with keywords were cleaned up only when they appeared after a question mark (?). Since some time other techniques are sometimes using semicolon, exclamation mark (!) or ampersand (&). Keywords appearing in such an URL were kept as is.

These URLs are causing a big number of different URLs without any gain, because often many keywords are used with random values (eg. ec.europa.eu/research/participants/portal/CaptchaImage!efp7_SESSION_ID=r9VhLnvNxnqXvPvVzBthnH2phdbh9csGQRgjFT3h5rp0XFZrRBkx!-145)

This is now more correctly processed and keywords that are meaningless are thrown out of the URL. With this more relevant URLs will be stored. List of keywords can be found here.

Another type of Url has been cleaned up since 2nd of September:

example : ec.europa.eu/eclas/F/HGTVKPIX3BTA347ITI2L13MI1HSTQJIMNIGQFBP1SUSBKI67I4-25438 the last part is always a hash key and even when you visit the page again you get another url or hash key. As from now only

ec.europa.eu/eclas/F will be stored. More will follow.

This has been applied to the ETL process as from 1st September 2011.

The Dashboard report will take into account these changes as from 18th October 2011. The URLs mentioned above are retroactively adapted.

domain starting with www. (Fixed since July 2011)
All domains entered by your visitors using www in fromt of the standard domain name(e.g. www.ec.europa.eu), where always considered as unclained. This is because the configuration was not designed for these kind of domains. As from August 1st 2011 all existing domains registered in the configuration file will be in adition extended with www. This change will load the www-domains to the respective webmarts and as defined in the configuration file. An analysis showed that www-domains were used by your visitors for about 1% of the URLs. Detailed figures for your webmart can be requested, as all the unclaimed data stays for 90days on daily level, 12 months on monthly and yearly level.

.jsp Pages (Fixed since 1 July 2011)
During the system update on 17.02.2011, an erroneous change was introduced. The script, which also handles the exclusion of .js extensions, has been modified to prevent fetching of the rss (to make sure that the rss were excluded). Since then the .jsp has been interpreted as 'starting with js' (read JavaSript). The impact has been that all .jsp extensions were excluded (from the counted traffic).

This bug has been corrected as from as 1 July 2011. For those, who use .jsp pages on their websites, the traffic will be higher since that date. This bug caused so the lack of part of the data for the period from 17.02.2011 to 30.06.2011.

Parameter URL in Dashboard Reports (Fixed since March 2011)
Need to always use the wild-cards in the Dashboard reports even when introducing the absolute url.

Languages Report (Fixed since 21 December 2010)

 * 1) missing GA among languages counted
 * 2) when report is ran for 1 language - there appears an other (blank) line too

Discrepancy between visits counted by different reports
There is a difference of 8% between the number from Yearly Dashboard and the Pages report (for Campaign-Biodiversity ENVI). As from February 2011 the figures should be less different as the input has been changed and the same as the SAS Web analytics statistics.

Missing Reports
Missing report on the Organic Search (equivalent to Top 100 Search Phrases in the SAS WH)

Unclaimed urls polluting webmarts europa HP, Commission HP and Statistics office (Fixed since 1st October 2011)
Because of the "splash" page, unclaimed urls were polluting the data - this has been fixed since 1st October 2011.

Url prompt not working in Spider File Hits report (reported: october 2011)
The Url prompt has been put in the reports prompt without any meaning. There is NO url data available in the spider file hits data base. Data available : Spider, number of hits, status code. The data is only available on the Europa overall level. A review of this report is planned by the end of 2011 in order to deliver more relevant report.

Files with .eot and .txt extensions (Fixed since February 2014)
The files with .eot and .txt extensions are excluded from the system as they are not considered as pages. This change might have a minor impact on figures.