286 views
0 votes
0 votes
This question is not related to GATE, but certainly would help you to grill your mind to come up with a better approach.. Feel free to comment if you think, this is not a right forum for this question.

I have 10 LDAP systems which contain users and groups in the form of objects. Each user / group is identified by a distinguishedName like name/systemName where name is name of user or group.

Group objects store which users are members of the group. 

    GroupA = {UserA, UserB, UserC ....}
    GroupD = {UserF, UserB, UserK ....}

This information is not stored in the users object. User objects have user related attributes.

    UserA = {name="ABC", dob="x/x/x", }
    UserB = {name="PQR", dob="x/x/x", }

I want to make a system, which will query all systems to get user and group objects and show them on GUI this way

    UserA = {name="ABC", dob="x/x/x", memberOf=[GroupA]}
    UserB = {name="ABC", dob="x/x/x", memberOf=[GroupA, GroupB]}


There are millions of users and millions of groups in these systems overall.
The software I have created runs on various machine simultaneously. The software is written in Java. 

I am using this approach to achieve this:

  •  Start my program on one machine - Machine#1
  • Get all group objects from all the systems one by one and store them in memory store with key as user and value as list of group names
  • Once all the groups are queried and stored in memory store, then fire my program on other machines too
  • Other machines will copy the memory store created on Machine#1 to their machine (using third party lib - ehcache)
  • Program on each machine will query some set of Systems (Machine#1 will query system 1,2,3; Machine#2 will query 4,5,6 .. and so on)
  • Get all user objects and as I receive each user object, query memory store to see if user has any groups attached. If yes, pick them and store in DB?
  • Once all the programs on all machines are finished, we have result in DB which can then be shown anywhere

This approach has following disadvantages:

  •  Consumes lot of memory (memory stores uses 20 GB RAM on one machine)
  • Memory store which needs to be replicated on other machines is not so stable process
  • Memory store is only built on single machine. This takes about 3 hrs to complete. During this time, other machines are idle, which is waste of resources.

End goals of application should be:

  •  Best performance possible
  • Minimal wastage of computing time
  • Data (user objects showing groups) should be available in said format. The format cannot be changed as there is dependency with other downstream systems.

Is there a better approach to achieve the same ?

Please log in or register to answer this question.

Related questions

0 votes
0 votes
0 answers
3
S Ram asked Feb 1, 2017
331 views
can someone provide me the detailed description for this answer?