Consulting

Turn data into insight to enable informed decisions and faster growth.

Training

Data science is unlocking growth, upgrade your skills and level up your team.

Governance

Data privacy is a good thing, we can help you understand what is needed and how to implement on your plan.

Do you want to see more?

Check our blog

From our blog

A mix of data science, insights, human behaviour, security and other interests.

Learning Phrases with PyLucene and Pytorch, part 2.

on March 30, 2026

In part 2 we reuse our tokenised index and use pytorch to build a model for significant phrase extraction. It worked surprisingly well and being able to switch Analyzers proved useful. We found that the English Analyzer with stopword removal and stemming worked best.

The results are indicative, neither the dataset size or the length of training cycles are sufficient for the development of a genralised phrase extractor but the succcess and ovelap found between pylucene and pytorch is very encouraging. We just need to scale it up.

Continue reading

The last metre, where decisions are made.

on March 12, 2026

The critical part of all analytics and data pipeline development happens in the last metre. It is where the decision is made and the impact and benefits can start.

Take a moment to think about how you interact with data each week:

  • What are the skills and tools you are using?
  • How much do you need to interpret?
  • Do you know the limitations of the data sets?

One of my first questions is often “What decision are you trying to support?”. The reason for this is to understand the full context for the decision and the data that will be needed to support it. It is not uncommon for us to need to start recording something new or make changes to how telemetry is captured in fulfilling analytics requests.

Continue reading

Learning Phrases with PyLucene, part1

on March 3, 2026

Lucene is a library that is used to build text search optimised indexes, it is an Apache project and is the core file format sitting under ElasticSearch. The following code uses pylucene which is a JNI wrapper to the Java API.

The algorithm derives from the idea that the terms in search results will have increased frequencies for their search terms and associated concepts. Phrases similarly should have increased frequencies. By using Lucene it is fast, though we have to index first. The code also demonstrates an integration between Lucene and Pandas for analytics. The technique here could be used summarise in aggregate user / player entered text in surveys, reviews etc. That might otherwise get ignored by analytics.

Continue reading

Analytical Problem Solving

on November 23, 2018

Data can tell you much, but only if you know how to use it effectively. After all, if you know how many people have signed up to your service, but not how many are actually using it, you can’t truly understand customer behaviour and prioritise your service improvements. Analytical problem solving can help you gather the right data and use it to solve real problems - if you can build the insights that will reveal the full story of your customers, and help give you the answers you want from your data.

Continue reading