This question is not related to GATE, but certainly would help you to grill your mind to come up with a better approach.. Feel free to comment if you think, this is not a right forum for this question.
I have 10 LDAP systems which contain users and groups in the form of objects. Each user / group is identified by a distinguishedName like name/systemName where name is name of user or group.
Group objects store which users are members of the group.
GroupA = {UserA, UserB, UserC ....}
GroupD = {UserF, UserB, UserK ....}
This information is not stored in the users object. User objects have user related attributes.
UserA = {name="ABC", dob="x/x/x", }
UserB = {name="PQR", dob="x/x/x", }
I want to make a system, which will query all systems to get user and group objects and show them on GUI this way
UserA = {name="ABC", dob="x/x/x", memberOf=[GroupA]}
UserB = {name="ABC", dob="x/x/x", memberOf=[GroupA, GroupB]}
There are millions of users and millions of groups in these systems overall.
The software I have created runs on various machine simultaneously. The software is written in Java.
I am using this approach to achieve this:
- Start my program on one machine - Machine#1
- Get all group objects from all the systems one by one and store them in memory store with key as user and value as list of group names
- Once all the groups are queried and stored in memory store, then fire my program on other machines too
- Other machines will copy the memory store created on Machine#1 to their machine (using third party lib - ehcache)
- Program on each machine will query some set of Systems (Machine#1 will query system 1,2,3; Machine#2 will query 4,5,6 .. and so on)
- Get all user objects and as I receive each user object, query memory store to see if user has any groups attached. If yes, pick them and store in DB?
- Once all the programs on all machines are finished, we have result in DB which can then be shown anywhere
This approach has following disadvantages:
- Consumes lot of memory (memory stores uses 20 GB RAM on one machine)
- Memory store which needs to be replicated on other machines is not so stable process
- Memory store is only built on single machine. This takes about 3 hrs to complete. During this time, other machines are idle, which is waste of resources.
End goals of application should be:
- Best performance possible
- Minimal wastage of computing time
- Data (user objects showing groups) should be available in said format. The format cannot be changed as there is dependency with other downstream systems.
Is there a better approach to achieve the same ?