A unique method of depersonalizing personal data.
A unique method of depersonalizing personal data.
According to the definition given in the Federal Law «On Personal Data» dated 26.07.2006 No. 152, depersonalization is a method of processing PD, as a result of which it is impossible to identify the individual to whom this data belongs in the processed PD. But there is another important requirement not mentioned in the law — such processing of PD must be reversible, otherwise it will simply be a loss of information.
Why do you need to depersonalize PD? To save money on their protection — after all, according to the classification (Order of the FSTEC of Russia, the FSB of Russia and the Ministry of Information Technologies and Communications of Russia dated 13.02.2008, No. 55/86/20), depersonalized PD is the 4th class of protection, which does not require confidentiality protection.
So let's figure out what it means to identify. Identification of any object is identification, i.e. proof of the unambiguous correspondence of the available information about the object to the object itself. This is theoretically possible if:
1. All objects are unique within the framework of the available information (all people are different — the problem has no more than one solution);
2. There is at least one person who has each set of available requisites (all information is authentic — the problem has no less than one solution).
What does it mean that it is possible and impossible to identify? Unfortunately, we cannot do without a quantitative assessment of the probability, and this is a strictly normative issue, and unfortunately, it has not been resolved. Therefore, for the sake of understanding, we will assume that if a given set of PD corresponds to a small number of people who are easily localized for further clarification, then this means that it is possible to identify. And vice versa, if these people cannot be localized, then it is impossible to identify a person by these PD. It is clear that much will depend on who is doing the localization. Therefore, we will assume that depersonalization is a way to protect PD from an intruder, and not a way to hide information from official bodies. That is, only publicly available sources and means will be used to increase the probability of identification.
Let's say it was not possible to prove that a given set of PD belongs (previously belonged) to only one person. What other options are possible? There are two of them — either a given set can belong to more than one person, or less than one, i.e. to no one.
The first case includes any insufficient set of PD (PD can belong to many people at the same time, for example, a name or date of birth) or an excessive set of PD (for example, two names are specifically indicated), and here it is very important how many potential subjects there are, and what this group of people is limited to (for example, it is easier to find a person by name if it is known that this is an employee of the enterprise — do not forget that the properties of the set of PD itself are also information!).
The second case includes distorted PD (coding, masking, cryptography, etc.), and here the possibility of identification depends only on the degree of distortion.
Thus, if we find and technically implement a processing method that will lead PD to the described cases, then it means that we have depersonalized PD. It is not difficult to find such methods — for example, they can be taken from the US standard NIST SP 800?122 (the name can be translated as «Methods for protecting the confidentiality of PD»). But it has not been officially adopted by us, so we will proceed directly to considering the technical implementation.
Let's start with the second case, as the most obvious. Using any kind of distortion based on the secret of the algorithm (rearranging letters, replacing them, adding noise, etc.) is useful only for short-term processing (transferring information), but not for permanent storage. The algorithm is often known to third parties (implemented by a third-party software manufacturer), which increases the likelihood of compromise. As for cryptography, everything depends on the secrecy of the key, i.e. it is quite reliable, but the use of this method gives rise to many organizational problems (the mandatory use of certified means of protection, obtaining an FSB license, etc.).
The first case is much more interesting because of its non-obviousness. Non-obviousness consists precisely in the implementation of reversibility. It is very easy to make a set of PD both insufficient and redundant — remove some data or add unnecessary ones, but what was removed cannot be thrown away — it will have to be placed in another place that will not be accessible simultaneously (at any workstation) with the remaining set of PD. If PD are added, then information about this difference must be hidden in an inaccessible place.
In the NIST SP 800?122 standard, this method is specified as «database separation using cross-references.» Such separation is used everywhere when working with any databases, but there is no task of depersonalization, so although the databases are divided into different storages, they have a logical connection and therefore are processed simultaneously.
Let's see what the cross-reference method will give us for depersonalization. To do this, we will divide the PD radically — we will select all the identifying details (full name, date and place of birth, address and phone number, passport, etc.) in one database — let it be a directory of individuals (classification — 3rd class), in another database there will be everything else (depersonalized PD — 4th class). In this case, the depersonalized database will be publicly available (including via the Internet), and the directory database must be protected from unauthorized access. Information leakage will occur only if an intruder receives the directory database and can match it with the depersonalized database. We must exclude this possibility. But the same matching is needed by the ISPDN operator for processing PD. How will he ensure it?
The docking (comparison) of these bases for the implementation of reversibility must be done by a certain code (identifier) - unique, but absolutely abstract (you cannot use the numbers of a person's documents — these details will be in the directory). The essence of the docking is to compare the identifier from one base with the identifier of another base — when they are the same, it means that the information of the two bases is docked. If the comparison is made at the workplace of the reference ISPDN, then the depersonalized base can be available here (availability will be one-sided, and the ISPDN class will be higher than 3), but if the comparison is made at the workplace of the depersonalized ISPDN, then the database-directory is inaccessible at this place, and in this case the identifier from the directory can get into the depersonalized base only through an external medium. In this case, the external medium should not have the real details of the person whose code is recorded in it. Although it can have abstract features (color, pattern, etc.).In order for a person to be served within the framework of the depersonalized database, he must present this same external carrier every time, i.e. carry it with him at all times. Moreover, the external carrier can be of any nature (paper, plastic, metal), and the abstract characteristics of the carrier will be understandable only to the owner and will allow to easily distinguish his carrier from others.
This method of depersonalization seems so simple that doubts arise about its efficiency and reliability. To what extent will the costs of creating a security system using depersonalization be reduced? What will happen if a person loses this carrier, or it is stolen in order to gain access to the owner's PD? Such questions arise, and will certainly arise, but this cannot serve as a reason for abandoning new technologies, but only a pretext for their further improvement.
Despite the severity of the problem and the ease of implementation, this method of using external carriers in the process of depersonalizing PD was patented only in April 2011 by our organization (patent No. 103414).
For all questions regarding the use of this technology, please call +7 (351) 700?13-29,
+7 (351) 777?82-88 or +7 (908) 587?87-73