Quote Fraud Guide

Insurance quote fraud refers to the deceptive practice of providing false or misleading information during the process of obtaining an insurance quote. Individuals or organisations engaged in this fraudulent activity deliberately manipulate data such as their personal details, assets, or claims history to secure lower insurance premiums.

“Research Reveals Half of U.K. Consumers Think It’s Fine to Fib”

So how can Neo4j help?

Full example here: https://neo4j.com/developer/industry-use-cases/insurance/quote-fraud/

How Graph Databases Can Help?

Real-Time Fraud Detection: Neo4j's real-time data handling helps insurance companies detect and prevent fraud by quickly identifying anomalies and suspicious patterns in quotes, policies, and claims.

Graph Data Modeling: Neo4j helps insurance companies detect and prevent fraud more accurately by modelling data as a graph, which allows for identifying hidden relationships and patterns among entities like policyholders, claims, agents, and fraud indicators.

Network Analysis: Neo4j's graph algorithms and traversal capabilities can help insurers identify fraudulent networks and patterns involving multiple policies, claimants, or agents.

Clean Up Environment.

// Delete EVERYTHING
MATCH (n) DETACH DELETE n;

Load Demo Data.

// Create quote nodes
CREATE (q1:Quote {firstname: "Micheal", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 9}), change_info: "first quote"})
CREATE (q2:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 4}), change_info: "name change ea to ae"})
CREATE (q3:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "YO30 7DW", longitude: -1.0927426, latitude: 53.96372145, passport: 584699531, created_date: datetime()-duration({years: 1, months: 1, minutes: 3}), change_info: "postcode_change"})
CREATE (q4:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({years: 1, months: 1}), change_info: "passport number"})
CREATE (q5:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime()-duration({months: 1}), change_info: "quote 1yr later"})
CREATE (q6:Quote {firstname: "Michael", surname: "Down", dob: date("1988-02-02"), postcode: "PA62 6AA", longitude: -5.851487, latitude: 56.359258, passport: 584699530, created_date: datetime(), change_info: "quote 1m later"})


// Create all relationships
CREATE (q1)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q1.created_date, q2.created_date).seconds}]->(q2)
CREATE (q2)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q2.created_date, q3.created_date).seconds}]->(q3)
CREATE (q3)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q3.created_date, q4.created_date).seconds}]->(q4)
CREATE (q4)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q4.created_date, q5.created_date).seconds}]->(q5)
CREATE (q5)-[:NEXT_QUOTE {diff_seconds: duration.inSeconds(q5.created_date, q6.created_date).seconds}]->(q6)

Data Model

Let's review the schema:

// Show database schema
CALL db.schema.visualization();

Quick look at the raw data.

First, let's have a quick look at the raw data that is stored inside Neo4j.

// View all quotes
MATCH path=()-[r:NEXT_QUOTE]->() 
RETURN path;

Split the quote chains based on time.

In the world of a quote, the time between quotes is a really important factor. Imagine the following scenario.

Buying Car Insurance:

Normally car insurance is something that we would have you buy roughly every year. A policy duration is usually 12 months, and because of this, comparing a quote from last year to the new quote this would have some obvious differences.

No claims bonus - (hopefully) be 1 year greater than the previous year.

Car Age - would be 1 year older

Mileage - we would expect this to be greater. More so, depending on factors like a person's age, job, address etc...

This means that it to look for discrepancies in the quote, we would need to group them into small buckets of time. I think of it as similar to a web session.

In the below query, we split the chain whenever a relationship has a time difference between quotes of greater 3600 seconds (or 1 hour)

// Split Quote Chain
MATCH path=()-[rel:NEXT_QUOTE]->()
WHERE rel.diff_seconds < 3600
RETURN path;

Return Single Record

Same outcome but a more complicated query.

MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE
    // Path termination condition (first)
    (not exists{ (firstQ)<-[:NEXT_QUOTE]-() } or exists{ (firstQ)<-[x:NEXT_QUOTE]-() where x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (not exists{ (lastQ)-[:NEXT_QUOTE]->() } or exists{ (lastQ)-[x:NEXT_QUOTE]->() where x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    all(x in relationships(path) where x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - Duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

RETURN path

Create Similarity Relationship

Here we are using the same logic as in the previous query, but this time we are creating a relationship between all nodes that we deem are "similar".

// Create Similarity Relationship
MATCH path=(firstQ)-[r:NEXT_QUOTE*..1000]->(lastQ)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

WITH nodes(path) as nodes

// Iterate over the list in chain order we create an array [0,1,2,3... length - 2]
UNWIND range(0,size(nodes)-2) as index

// For each position (index) in the list take the node at that position (current) and the rest
WITH nodes[index] as current, nodes[index+1..size(nodes)] as rest

// Iterate over the rest keeping current to get all pairs of nodes without repetitions
UNWIND rest as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
MERGE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

Query ALL Quotes with a Similarity relationship

Lets bring back all the Quotes with their Similarity relationships.

// Query all quotes with their similarity relationship
MATCH path=()-[:SIMILARITY]->()
RETURN path;

Calculate static Fraud Score

Based on all the Quote nodes and SIMILARITY relationships we have in our graph. We can work out a similarity score that will indicate how similar quotes are to each other.

The Fraud Level is based on the logic that:

LOW: Greater than 70% Similar

Medium: Between 70% and 50% Similar

HIGH: Less than 50% Similar

// Calculate static Fraud Score
MATCH path=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN sum(r.similarity_score)/COUNT(relationships(path)) AS Similarity,
CASE
    WHEN COUNT(relationships(path)) = 0 THEN 'Additional Quote Needs Adding'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(path)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level

Lets put it all together!

In this example, I will create a new Quote which will be used to score the whole quote journey and then provide back a fraud score. The steps in this process are:

Create new Quote node

Only create SIMILARITY relationships between the new node and all others in its community. We DO NOT re-score all nodes, just the new one.

Recalculate Fraud Score and pass it back to the client.

// // // Realtime Quote Score // // //

// Get last `Quote` node in quote chain
MATCH (last:Quote)
WITH last
ORDER BY last.created_date DESC
LIMIT 1
WITH last
// Create new quote node
MERGE (current:Quote {
    change_info: "changed dob",
    created_date: datetime(),
    dob: Date("1978-11-30"),
    firstname: "Michael",
    surname: "Down",
    latitude: 56.359258,
    longitude: -5.851487,
    passport: 584699530,
    postcode: "PA62 6AA"
})
WITH last, current, duration.inSeconds(DateTime(last.created_date), DateTime(current.created_date)) AS time
// Create relationship
CREATE (last)-[:NEXT_QUOTE {diff_seconds: time.seconds}]->(current)

WITH current

// Minimum comparison
MATCH path=(firstQ)-[r:NEXT_QUOTE*0..100]->(current)
WHERE

    // Path termination condition (first)
    (NOT EXISTS{ (firstQ)<-[:NEXT_QUOTE]-() } OR EXISTS{ (firstQ)<-[x:NEXT_QUOTE]-() WHERE x.diff_seconds >= 3600 } )
    AND

    // Path termination condition (last)
    (NOT EXISTS{ (lastQ)-[:NEXT_QUOTE]->() } OR EXISTS{ (lastQ)-[x:NEXT_QUOTE]->() WHERE x.diff_seconds >= 3600 } )
    AND

    // No gaps condition (if you remove this condition then gaps are allowed and you get spurious longer chains that verify the end of path but not the max diff condition)
    ALL(x IN relationships(path) WHERE x.diff_seconds < 3600 )
    AND

    // Filter based on quote in the last N days
    firstQ.created_date > datetime() - duration({days: 1000})
    AND

    // Where there are more than one quote in the chain otherwise there is nothing to compare against
    length(path)> 1

//let's keep just the nodes in the chain
UNWIND nodes(path)[0..-1] as subsequent

WITH current, subsequent,

// Build up similarity scores for all properties
// Strings
apoc.text.levenshteinSimilarity(current.firstname, subsequent.firstname) AS firstname,
apoc.text.levenshteinSimilarity(current.surname, subsequent.surname) AS surname,
apoc.text.levenshteinSimilarity(current.postcode, subsequent.postcode) AS postcode,

// Numbers
(current.passport - subsequent.passport) AS passport_number,
apoc.text.levenshteinSimilarity(toString(current.passport), toString(subsequent.passport)) AS passport_similarity,

// Dates
duration.inDays(current.dob, subsequent.dob).days AS dob,

// Location
toInteger(point.distance(point({longitude: current.longitude, latitude: current.latitude}), point({longitude: subsequent.longitude, latitude: subsequent.latitude}))) AS location

// Create :SIMILARITY Relationship
CREATE (current)-[:SIMILARITY {
    // Add change string for simplicity
    change: subsequent.change_info,

    // Strings
    firstname: firstname,
    surname: surname,
    postcode: postcode,

    // Numbers
    passport_number: passport_number,
    passport_similarity: passport_similarity,

    // Dates
    dob: dob,

    // Location
    location: location,

    // Calulcated Similarity Score
    similarity_score: (firstname + surname + postcode + passport_similarity ) / 4
}]->(subsequent)

WITH *

// Quote - 3 - Calculate Fraud Score
MATCH p=(a)-[r:SIMILARITY]->(b)
WHERE a.created_date > datetime() - Duration({days: 1000})
RETURN avg(r.similarity_score) AS Similarity,
CASE
    WHEN COUNT(relationships(p)) = 0 THEN 'Run Agiain'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 70 THEN 'LOW'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 70 AND toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) > 50 THEN 'MEDIUM'
    WHEN toInteger(sum(r.similarity_score)/COUNT(relationships(p)) * 100) < 50 THEN 'HIGH'
END AS Fraud_Level;