Thod Nguyen, CTO of eHarmony, delivered a fascinating insight into how the world’s largest relationship service provider improved customer experience by processing matches 95% faster and increased subscriptions by 50% after migrating from relational database technology to MongoDB
eHarmony currently operates in North America, Australia and the UK. The company has a great track record of success – since launch in 2000, 1.2 million couples have married after being introduced by the service. Today eHarmony has 55m registered users, a number that will increase dramatically as the service is rolled out to 20 other countries around the globe in the coming months.
In addition to eHarmony rolling out to 20 new countries, they also plan to bring their data science expertise in relationship matching to the jobs market – matching new hires to potential employers
eHarmony employs some serious data science chops to match prospective partners. Users complete a detailed questionnaire when they sign up for the service. Sophisticated compatibility models are then executed to create a personality profile, based on the user’s responses. Additional research based around machine learning and predictive analytics is added to the algorithms to enhance the matching of prospective partners.
Unlike searching for a specific item or term on Google, the matching process used to identify prospective partners is bi-directional, with multiple attributes such as age, location, education, preferences, income, etc. cross-referenced and scored between each potential partner.
In eHarmony’s initial architecture, a single monolithic database stored all user data and matches, however this didn’t scale as the service grew. eHarmony split out the matches into a distributed Postgres database, which bought them some headroom, but as the number of potential matches grew to 3 billion per day, generating 25TB of data, they could only scale so far. Running a complete matching analysis of the user base was taking 2 weeks.
In addition to the problems of scale, as the data models became richer and more complex, adjusting the schema required a full database dump and reload, causing operational complexity and downtime, as well as inhibiting how quickly the business could evolve.
- Support the complex, multi-attribute queries that provide the foundation of the compatibility matching system
- A flexible data model to seamlessly handle new attributes
- The ability to scale on commodity hardware, and not add operational overhead to a team already managing over 1,000 servers
eHarmony explored Apache Solr as a possible solution, but it was eliminated as the matching system requires bi-directional searches, rather than just conventional un-directional searches. Apache Cassandra was also considered but the API was too difficult to match to the data model, and there were imbalances between read and write performance.
After extensive evaluation, eHarmony selected MongoDB. As well as meeting the three requirements above, eHarmony also gained a lot of value from the MongoDB community and from the enterprise support that is part of MongoDB Enterprise Advanced.
- Engage MongoDB engineers early. They can provide best practices in data modeling, sharding and deployment productization
- When testing, use production data and queries. Randomly kill nodes so you understand behavior in multiple failure conditions
- Run in shadow mode alongside the existing relational database to characterize performance at scale
Of course, MongoDB isn’t the only part of eHarmony’s data management infrastructure. The data science team integrates MongoDB with Hadoop, as well as Apache Spark and R for predictive analytics.
- 95% faster compatibility matching. Matching the entire user base has been reduced from 2 weeks to 12 hours.
- 30% higher communication between prospective partners.
- 50% increase in paying subscribers.
- 60% increase in unique web site visits.
And the story doesn’t end there. They will start to add geo-location services as part of the mobile experience, taking advantage of MongoDB’s support for geospatial indexes and queries. eHarmony are also excited by the prospect of pluggable storage engines delivered in MongoDB 3.0. The ability to mix multiple storage engines within a MongoDB cluster can provide a foundation to consolidate search, matches and user data. Whether you’re looking for a new partner, or a new job, it seems eHarmony has the data science and database to get you there.