Home > DemandTools Modules > DemandTools Cleansing Modules > Single Table Dedupe > SingleTable DeDupe Wizard

SingleTable DeDupe Wizard

Capabilities & Features:

  • Flexible identification of potential duplicates
  • Easy to use interface for standard and custom objects
  • Custom Merge capabilities (allows the user to select fields from all records in the duplicate group to retain in the final merged record)
  • Pre-Built Scenarios aid in quicker setup (are customizable and should be reviewed by each user)
  • Save custom scenarios for quick recall and use
  • Sub-Object reassignment
  • Custom Master Rule creation

DeDuplication Best Practices & Guidelines:

Please test your deduplication scenarios thoroughly before deduping a large number of records, as there is no automatic "undo"/"rollback" option.  If you have a Salesforce sandbox, we recommended that you test in this environment first before deduping your production data.

More information on pointing DemandTools to the Salesforce Sandbox can be found
here.

1. Dedupe in the following order:

  • Account; Contact; Lead; Lead to Contact; Lead to Account; Opportunity; Custom Objects

2. Use at least a 3 pass strategy for identifying duplicates

  • Start with very rigid criteria
    • Rigid Criteria means matching on at least 4 or 5 fields and using strict matching type
    • Loosen criteria by matching on less fields and choosing less strict mapping type
  • Loosen criteria with each pass
    • This strategy will help clean up the duplicates in a quicker more manageable manner

3.  It is recommended that name, address, phone fields are standardized prior to searching for duplicates.

  • This is not a requirement as DemandTools has lots of advanced mapping techniques to find matches when these fields are not standardized.  However, the cleaner and more complete the data, the easier it will be to find matches.
    • MassImpact can be used to standardize most of these fields.  Please refer to the MassImpact documents for details on how to make basic changes (i.e. find records where state - Texas and change to TX), and how to use prebuilt formulas for more complex standardizations (i.e. "Normalized_US_Address", "NaPhoneFix" etc.).
    • The Address Verification module is also extremely helpful in standardizing as well as completing address data.  If Address Verification will be used, there is no need to use MassImpact to standardize address fields first, as this will be done as part of the verification process.

Screen One:

Step 1. Select the Object to De-Duplicate

The object for de-duplication can be selected by using one of the pre-built scenarios or by manual selection.

Pre-Built Scenario:

  • Pre-built scenarios take care of some of the setup work and can be helpful for beginners and power users
  • All pre-built scenarios can be modified and it is recommended that each user review (and modify the scenario to fit their data needs) the scenario they choose prior to running


  • Scenarios are listed in the left hand column
  • When a scenario is highlighted the center portion of the screen displays an overview of the scenario 
  • The right side of the screen shows the fields selected to view on the found duplicates/duplicate grid

 

  • Load Scenario: Loads the all settings of the selected scenario. The user must manually click through to the next screens.
  • Run Scenario: Loads scenario settings, automatically processes each step and applies the master rule to the found duplicates. The user must execute the merge.
  • Delete Scenario: Removes the scenario from the list and deletes the source xml file.
  • Save Scenario: Used for newly created or modified pre-built scenarios. When saving a modified pre-built scenario click the "Rebuild Description" button (this ensures the modification are saved in a new scenario). Scenarios will be saved in the DemandTools directory.
  • Clear Scenario: Clears out all selections and allows the user to start over either by selecting a different scenario or by creating one manually.

Manual  Object Selection:


  • Select the object to de-duplicate in the "Salesforce Object To DeDupe" drop down box and click "Use Object"
  • Proceed to manually select all fields to view on the duplicate grid

Step 2: Select the fields to view on the dupe grid

  • In pre-built scenarios some standard fields have been selected; phone, name, owner id, addres information, etc.
  • Be sure to select fields that will help determine if the duplicates returned should be merged; DUNS number, account number, or any internal unique ID's (may or may not be fields chosen to match on)
  • Other fields to include are those that will help in the selection of the master record
  • To select a field, click the checkbox next to the field, to de-select a field un-check the fields checkbox
  • The "Clear" button will remove all selected fields


Step 3: Select the records to search

  • Sub-setting records in the SingleTableDedupe can be done for various reasons.
  • Ex: Leads need to be sub-setted to only look at unconverted Leads (isconverted = False)
  • Sub-setting can also be done to ensure that records of different record types are not merged. 

More help on creating conditions can be found here: Using Condition


 

Screen Two:

Step 1: Editing the Match Criteria

  • When running a pre-built scenario the match criteria is pre-populated on Screen 2 but can be modified to fit each individual users needs
  • To change a field click on the box of the field name and choose the new field, the mapping type should be reviewed if the field is changed.
  • To change the mapping type click in the box of the mapping type and choose a suitable option
  • If a row/condition needs to be added, click the "Add Mapping Condition" button (on bottom left of the screen)
  • If a single row/condition needs to be removed, click the box to the left of the criteria and click the "Delete" key on the keyboard
  • To clear out all mapping click the "Clear Mapping" button (on bottom left of the screen)



 

 

Step 2: Understanding Mapping Types and Mapping Conditions:

  • Complete Mapping Type definitions can be found by clicking on this link: Mapping Types
  • When a field is used in the match criteria the only records that will be returned in the dupe grid (screen 3) are those that have a value populated in all of the fields listed as a field to match
    • Ex: "IBM Co 1236 Main St Cleveland 44115 4406678956" will match to "IBM 1236 Main cleveland 44115-2268 440-667-8956" 
  • "Match Blank Values" checkbox will allow for the statement above to be true, it will also allow for blank values to be matched to other blank values for the fields selected
    • Ex: Match Blank Values has been selected for phone: "ibm co 1236 main cleveland 44115 (blank phone number)" will match to "Ibm co 1236 Main St. Cleveland 44115 (blank phone number)"
  • In order to have a populated field "match" to an unpopulated field, the field can be viewed on the dupe grid but should not be used as a field to match in the match criteria setup in Screen 2. This is not actually a match on the omitted field as the field is just being viewed.
  • For detailed definitions of the Mapping Types, click here

Select the "Screen 3 - Merge Control" Button to search for potential duplicates and move to screen 3 to analyze and eventually process.



Screen Three:
 
Step 1: Grid Options

  • Grid Field Arrangement:
    • By default, populated fields are shown at the top of the interface
    • The grid is sorted by the Duplication Key. It cannot be sorted in any other way.
    • Fields on the grid can be rearranged by dragging and dropping them in the desired order 

 

    • Remove a field from the grid by dragging and dropping it out of view 

 

 

  • View Record Details and Sub-Objects:
    • To explore a returned record in more detail click the icon to the right of the pin. The default view is shown below in HTML format (a black & white PDF type view). To change this option and view the record in Salesforce go to "Edit-Options and select the checkbox for "Use Salesforce to view objects"

 



  • Duplicate Set Controls:
    • Expand All:  will expand all duplicate groups
    • Collapse All: will collapse all duplicate groups
    • Check All: will place a check mark in all checkboxes of the records in the duplicate grid
    • Uncheck All: will remove the check marks from all checkboxes of the records in the duplicate grid
    • Clear All: will clear all checks and master record selections from all records in the duplicate grid
  • Current Master Rule:
    • Apply Rule to All: will apply the selected Master Rule to all records in the duplicate grip and place a check in their checkboxes
    • Apply Rule to Checked: will apply the selected Master Rule only to the records with a check mark in their checkbox
  • Grid Setup:
    • Show More Fields: will allow for additional fields of data to be added to the grid for all records in the grid
    • Export Grid Data: will allow for the grid data to be exported to a spreadsheet
  • Grid Display:
    • Drop Down: Show All; Show Checked; Show Unchecked
      • Show All: shows all returned potential duplicates
      • Show Checked: shows only those records that have a check in their check box
      • Show Unchecked: shows only those records that do not have a check in their check box
    • Show Grid Horizontal Scroll:
      • Will allow the user to enlarge the fields on the grid and scroll left to right allowing for more screen area

 

  • Edit Field Values: 
    • Right clicking on any field in the duplicate grid will give the user an option to:
      • "Update Value in Salesforce" (will only update that field and only for the record that was right clicked on with a value specified by the user) and
        • The single field update option can also be accessed by double clicking on the value to change
      • "Update Groups Rows in Salesforce" (will update that field for all records in the duplicate group with a value specified by the user) 
      • Either option will make the change in real time and does not require a merge to make the change to the field

                                                                                    

Step 2: Selecting a Master Record

  • Each duplicate set needs a master record designation in order to be merged (as well as a check mark in their check box)
  • There are 3 ways to select a master record
    • Double clicking the pin (of record in duplicate set)
    • Setting up a Custom Merge
    • Application of a Master Rule (see step 3)
  • Double Clicking the Pin:
    • Double click the pin beside the duplicate record to activate it as the master record, turning the pin green.
      • Gray pins    indicate no record has been selected as master
      • Green pins  indicate that this record has been selected as the master record
      • Red pins      indicate the records that have been selected or determined to be servant records
  • Setting up a Custom Merge: 
    • Right click on the duplicate key and select "Setup Custom Mergeā€

    • Anywhere from 2 - 100 records may be placed in the custom merge window
    • Select the Master Record and double click the field values to retain from the displayed record sets
      • The values that will be retained for the final merged record will be highlighted in yellow
    • When the field selections are complete, click the "Accept" button
    • The user will be returned to the duplicate grid
    • The pins of these records will now be Blue  to indicate that they are a part of a custom merge

  

  • Other Options from the Duplicate Key Right Click Menu:
    • Apply Action to Group:
      • "Check Group": will place check marks in the checkboxes for all records in a single duplicate group
      • "Uncheck Group": will remove marks from the checkboxes for all records in a single duplicate group
      • "Merge Group": will merge the records with a check in their checkbox in a single duplicate group (if three records are in a group and only two are checked, only the checked records will be merged)
    • Apply Rules to Group:
      • will apply the selected rule in the menu to the duplicate group

NOTE:   All merges, whether manual or automatic, will be processed based upon the defined Merge Options listed along the right hand side!! (See Step 4)

Step 3: Choose or Design a Master Rule

  • Select a Pre-Built Master Rule
    • When using strict/rigid deduplication criteria the decision may be made (after a quick review of the records) to process all the records in the grid vie
    • Master rules do not need to be applied to each duplicate set individually (but can be as mentioned above using the duplicate key right click menu
    • Select the "Check All"  button at the bottom of the grid and all duplicate sets will be checke
    • To view the master-slave designations from the selected rule, click the "Apply Rule to All" button (the "Apply Rule to All" button can also be used, this will check the boxes as well as apply the selected master rule to all records
    • When the user is satisfied with the master record selection, they should review the options in the right hand pane (see step 4)
    • Alternatively, masters records can be selected based on the currently selected master rule in the upper right hand of the interface when the end user selects "Merge Checked Objects" and the merge process will be completed (this option will not allow for master-slave designation to be reviewed before merging, the master selection and merging happen in one step)


Currently selected master rule

  • Creation and Editing of Master Rules
    • "New Rule" button: opens a pop up box for the user to create a custom master rule
    • "Edit Rule" button: used to edit an existing rule or to just view the setup of an existing rule (will open the rule with the check mark)
    • "Delete Rule" button: Deletes the rule from the application
    • A master rule can be created for any object
    • When building a master rule there is the option to add unlimited conditions for analyzing either the record (as a whole) or the field values that the record contains.
    • At the record level, options such as oldest, newest, most recently modified (note that RECORD is the field selection for these options to appear) and many other advanced constraints are available. For field values, the Single Table Dedupe displays constraints appropriate for the data type of the field being analyzed.
 
  • Scoring a rule
    • After the setup of the initial field and constraint the user can score the positive incident of the constraint with an arbitrary point value.
    •  The default rules that ship with the software are based upon a 100 point scoring system but any system can be used.
    •  When adding multiple constraints the record is scored in cumulative fashion with each additional positive expression adding to the total score of the records.  
    • Additionally, negative scores can be applied to penalize a record.


A complex rule with multiple expressions and a score delta set

  • Using the Score Delta
    • The final component to creating a Master Rule is the decision to use or not to use a score delta
    • The score delta will be used by the deduper to help determine master vs. servant records
    • The value of the score delta determines how many points away a master record has to score from the next nearest servant record in order to be selected as the master

If the score delta threshold is set and not met: (records tie and have the same point value)

  • The deduper will use the rule as a scoring system only
  • It will not be able to select a master
  • The scores will be displayed, but the pins will remain grey and the boxes, unchecked


  • If no score delta is set:
    • The deduper will score the records
    • The record with the highest score will be chosen as the master
    • If there is a tie, it will pick the first record in the list

 

Step 4: -Select the merge control options

  • Update Fields where Master is Empty
    • For any field that is selected in the "Update Fields Where Master Empty" option box, the deduper will keep the value in that field from a subordinate record if that field is not populated with a value on the master record.
    •  will condense the options box below, when clicked again this icon will expand the options box below

    • "Check all" - will select all fields
    • "Clear all" - will de-select all fields
    • Manually select fields - the user can check and uncheck the fields they wish to use
  • Combine Field Values


    • Used to select the fields to concatenate in the master and subordinate records
    • Available for:
      • Multi-select pick lists  - will keep selected values from master and subordinate records on the final merged record
      • Long text area fields - will keep the values from all records in the merged group
      • Boolean Fields (like Check Boxes) - true value always wins (if one exists) regardless of the master record value
  • Choose Long Text Area Field separator:
    • Default is a Carriage Return/Line Feed; the user may select a specific character like ; by selecting the radio button next to "Custom"

  • Salesforce Merge CheckBox (use is recommended):
    • If the Salesforce merge is used, all objects will be merged to the master and the slave records will be moved to the recycle bin.

 

  • Reassign to Master Owner




    • Allows for sub-objects to be reassigned to the ownership of the master record owner during the merger (for both subordinate record sub objects and master record sub objects)
  • Additional log fields

    • Logging of additional fields of information in the DemandTools log file

Step 4: Processing records in batch mode

Once the records to merge are checked, and the master records determined (either by master rule or manual selection) select "Merge Checked Objects" to begin the merge process.
THERE IS NO RESTORE OR UNDO FOR MERGES! ONCE YOU MERGE THE RECORDS, THEY ARE MERGED!




Step 5: Saving A Custom Scenario:

  • Even if a user begins the de-dupe process using a pre-built scenario it is very likely that some changes/modifications will have been made to fit the users data needs.
  • Modified pre-built scenarios can be saved for future use with the changes made by the user
  • On screen 3, once all settings have been selected, click the "Save Scenario" button on the lower left hand side
    • The pop up screen will still show the pre-built settings
  • To save the modifications the user must click the "Re-Build Description" button
    • This forces the pop up to refresh and reflect the modifications
  • The user can rename the scenario (this is reccomended so there are not two scenarios with the same name; an effective naming convention is to use the company name or abbreviation as a prefix to the scenario name)

See also